MarkLogic Server includes functions that enable applications to provide spelling capabilities. Spelling applications use dictionary documents to find possible misspellings for words entered by a user. A common example application will prompt a user for words that might be misspelled. For example, if a user enters a search for the word albetros, an application that uses the spelling correction functions might prompt the user if she means albatross.
The spelling correction functions enable you to create applications that check if words are spelled correctly. It uses one or more dictionaries that you load into the database and checks words against a dictionary you specify. You can control everything about what words are in the dictionary. There are functions to manage the dictionaries, check spelling, and suggest words for misspellings.
The spell:double-metaphone / spell.doubleMetaphone and spell:levenshtein-distance / spell.levenshteinDistance functions return the raw values from which spell:suggest / spell.suggest, spell:suggest-detailed / spell.levenshteinDistance, and spell:is-correct / spell.isCorrect calculate their values.
You can use these documents or create your own dictionaries. You can also use the spell:make-dictionary / spell.makeDictionary spelling management function to create a dictionary document, and then use spell:load / spell.load to load the dictionary into the database.
The spelling lookup functions (spell:is-correct,
spell:suggest("THe")will return The, Thee, They, and so on as suggestions, while
spell:suggest("tHe")will give the, thee, they, and so on. In other words, if you capitalize the first letter of the argument to the spell:suggest / spell.suggest function, the suggestions will all begin with a capital letter. Otherwise, you will get lowercase suggestions.
If you want your applications to ignore case, then you should create a dictionary with all lowercase words and lowercase (using the XQuery
fn:lower-case function, for example) the word arguments of all spell:is-correct / spell.isCorrect and spell:suggest / spell.suggest functions before submitting your queries.
You can have any number of dictionary documents in a database. You can also add to or modify any dictionary documents that already exist. This section describes how to load and update dictionary documents, and contains the following topics:
To use a dictionary in a query, it must be in the database. To load a dictionary document using XQuery, use the spell:load function or the spell:insert function. For example, to load a dictionary document with a URI
/mySpell/spell.xml, execute a query similar to the following:
xquery version "1.0-ml"; import module "http://marklogic.com/xdmp/spell" at "/MarkLogic/spell.xqy"; spell:load("c:\dictionaries\spell.xml", "/mySpell/spell.xml")
This XQuery adds all of the
<word> elements from the
c:\dictionaries\spell.xml file to a dictionary with the URI
/mySpell/spell.xml. If the document already exists, then it is overwritten with the new content from the specified file.
/mySpell/spell.json, execute a program similar to the following:
You can download a sample dictionary from the MarkLogic Community site (). The community site links to github, which has small, medium, and large versions of the dictionary. Once you download a dictionary XML file, you can load it as a dictionary document using the spell:load function.
large-dictionary.xmlfile (or any other dictionary docments that might be available). The large dictionary has approximately 100,000 words and is about 3 MB to download. ALternately, you can choose
large-dictionary.jsonfile to load a JSON dictionary.
<size>-dictionary.xml(or the corresponding JSON document) to a file (for example,
.xml. You can now use this URI with the spelling correction module functions.
The transactional unit in MarkLogic Server is a query; therefore, if you are performing multiple updates to the same dictionary document, be sure to perform those updates as part of separate queries. In XQuery, you can place a semi-colon between the update statements to start a new query (and therefore a new transaction). If you use a semicolon to start any new queries that uses spelling correction functions in XQuery, each query must include the
import statement in the prolog to resolve the spelling module.
The following XQuery uses the spell:add-word function to add an entry for albatross to the dictionary with URI /mySpell/Spell
The following XQuery uses the spell:remove-word function to remove the entry for albatross dictionary with URI /mySpell/Spell
The results are ranked in the order, where the first word is the one most likely to be the real spelling. Your application can then prompt the user if one of the suggested words was the actual word intended.