MarkLogic Server includes functions that enable applications to provide spelling capabilities. Spelling applications use dictionary documents to find possible misspellings for words entered by a user. A common example application will prompt a user for words that might be misspelled. For example, if a user enters a search for the word albetros, an application that uses the spelling correction functions might prompt the user if she means albatross.
This chapter describes how to use the spelling correction functions and contains the following sections:
The spelling correction functions enable you to create applications that check if words are spelled correctly. It uses one or more dictionaries that you load into the database and checks words against a dictionary you specify. You can control everything about what words are in the dictionary. There are functions to manage the dictionaries, check spelling, and suggest words for misspellings.
The reference information for the spelling module functions is included in the MarkLogic XQuery and XSLT Function Reference and the MarkLogic Server-Side JavaScript Function Reference available through docs.marklogic.com. The spelling functions are divided into the following categories:
The spelling correction functions are built-in functions and do not require the import module
statement in the XQuery prolog. The following are the spelling correction functions:
XQuery Function | Server-Side JavaScript Function |
---|---|
spell:is-correct | spell.isCorrect |
spell:suggest | spell.suggest |
spell:suggest-detailed | spell.suggestDetailed |
spell:double-metaphone | spell.doubleMetaphone |
spell:levenshtein-distance | spell.levenshteinDistance |
The spell:double-metaphone / spell.doubleMetaphone and spell:levenshtein-distance / spell.levenshteinDistance functions return the raw values from which spell:suggest / spell.suggest, spell:suggest-detailed / spell.levenshteinDistance, and spell:is-correct / spell.isCorrect calculate their values.
The difference between spell:suggest (JavaScript spell.suggest) and spell:suggest-detailed (JavaScript spell.suggestDetailed) is that spell:suggest-detailed provides some of the information used in calculating the suggestions, and it returns a report (an XML representaiton in XQuery and an array of objects in JavaScript), whereas spell:suggest returns a sequence of suggested words. For most spelling applications, spell:suggest is sufficient, but if you want finer control of the suggestions you provide (for example, if you want to calculate your own order of returning the suggestions), you can use spell:suggest-detailed and then filter on some of the criteria returned in its XML or JSON output.
There is an XQuery module to perform management of dictionary documents. You can use this module in either XQuery or in Server-Side JavaScript. The spelling correction dictionary management functions are installed into the following XQuery module file:
where install_dir is the directory in which MarkLogic Server is installed. The functions in the spelling module use the spell:
namespace prefix, which is predefined in the server.
To use the functions in this module in an XQuery program, include the module declaration in the prolog of your XQuery program as follows:
import module namespace spell = "http://marklogic.com/xdmp/spell" at "/MarkLogic/spell.xqy";
To use the functions in this module in a JavaScript program, include a line similar to the following in your Server-Side JavaScript program:
const spell = require("/MarkLogic/spell");
There are two types of dictionary documents you can load into MarkLogic:
There are sample XML and JSON dictionary documents available at the following GitHub repository:
https://github.com/marklogic/dictionaries
You can use these documents or create your own dictionaries. You can also use the spell:make-dictionary / spell.makeDictionary spelling management function to create a dictionary document, and then use spell:load / spell.load to load the dictionary into the database.
Any XML dictionary documents loaded into MarkLogic must have the following basic structure:
<dictionary xmlns="http://marklogic.com/xdmp/spell">
<metadata>
</metadata>
<word></word>
<word></word>
......
</dictionary>
The <metadata>
element is optional. Use spell:make-dictionary / spell.makeDictionary and spell:load / spell.load to create your own dictionary documents.
Any JSON dictionary documents loaded into MarkLogic must have the following basic structure:
{
"metadata": { ... },
"words": ["word1", "word2", ... ]
}
The metadata
property is optional. Use spell:make-dictionary / spell.makeDictionary and spell:load / spell.load to create your own dictionary documents.
The spelling lookup functions (spell:is-correct, spell:suggest
, and spell:suggest-detailed in XQuery, spell.isCorrect, spell.suggest, and spell.suggestDetailed in JavaScript) are case-sensitive, so case is important for words in a dictionary. Additionally, there are some special rules to handle the first character in a spelling lookup. The following are the capitalization rules for the spelling correction functions:
spell:suggest("THe")
will return The, Thee, They, and so on as suggestions, while spell:suggest("tHe")
will give the, thee, they, and so on. In other words, if you capitalize the first letter of the argument to the spell:suggest / spell.suggest function, the suggestions will all begin with a capital letter. Otherwise, you will get lowercase suggestions. If you want your applications to ignore case, then you should create a dictionary with all lowercase words and lowercase (using the XQuery fn:lower-case
function, for example) the word arguments of all spell:is-correct / spell.isCorrect and spell:suggest / spell.suggest functions before submitting your queries.
You can have any number of dictionary documents in a database. You can also add to or modify any dictionary documents that already exist. This section describes how to load and update dictionary documents, and contains the following topics:
To use a dictionary in a query, it must be in the database. To load a dictionary document using XQuery, use the spell:load function or the spell:insert function. For example, to load a dictionary document with a URI /mySpell/spell.xml
, execute a query similar to the following:
xquery version "1.0-ml"; import module "http://marklogic.com/xdmp/spell" at "/MarkLogic/spell.xqy"; spell:load("c:\dictionaries\spell.xml", "/mySpell/spell.xml")
This XQuery adds all of the <word>
elements from the c:\dictionaries\spell.xml
file to a dictionary with the URI /mySpell/spell.xml
. If the document already exists, then it is overwritten with the new content from the specified file.
To use a dictionary in a query, it must be in the database. To load a dictionary document using JavaScript, use the spell.load function or the spell.insert function. For example, to load a dictionary document with a URI /mySpell/spell.json
, execute a program similar to the following:
const spell = require("/MarkLogic/spell"); declareUpdate(); spell.load("c:/dictionaries/spell.json", "/mySpell/spell.json");
This loads the file at the specified path into the dictionary JSON document at the specified URI.
You can download a sample dictionary from the MarkLogic Community site (). The community site links to github, which has small, medium, and large versions of the dictionary. Once you download a dictionary XML file, you can load it as a dictionary document using the spell:load function.
Perform the following steps to download and load a sample dictionary:
http://developer.marklogic.com/code/#dictionaries
https://github.com/marklogic/dictionaries
small-dictionary.xml
, medium-dictionary.xml
, or large-dictionary.xml
file (or any other dictionary docments that might be available). The large dictionary has approximately 100,000 words and is about 3 MB to download. ALternately, you can choose small-dictionary.json
, medium-dictionary.json
, or large-dictionary.json
file to load a JSON dictionary.<size>-dictionary.xml
(or the corresponding JSON document) to a file (for example, c:\dictionaries\spell.xml
).xquery version "1.0-ml"; import module "http://marklogic.com/xdmp/spell" at "/MarkLogic/spell.xqy"; spell:load("c:\dictionaries\spell.xml", "/mySpell/spell.xml")
.xml
. You can now use this URI with the spelling correction module functions.Use the following dictionary functions to modify existing dictionary documents:
The spell:insert XQuery function or the spell.insert JavaScript function will overwrite an existing dictionary if you specify an existing dictionary document (as well as creates a new one if one does not exist at the specified URI).
The transactional unit in MarkLogic Server is a query; therefore, if you are performing multiple updates to the same dictionary document, be sure to perform those updates as part of separate queries. In XQuery, you can place a semi-colon between the update statements to start a new query (and therefore a new transaction). If you use a semicolon to start any new queries that uses spelling correction functions in XQuery, each query must include the import
statement in the prolog to resolve the spelling module.
The following topics are about updating dictionary documents:
The following XQuery uses the spell:add-word function to add an entry for albatross to the dictionary with URI /mySpell/Spell.xml
:
xquery version "1.0-ml"; import module "http://marklogic.com/xdmp/spell" at "/MarkLogic/spell.xqy"; spell:add-word("/mySpell/spell.xml", "albatross")
If the /mySpell/spell.xml
dictionary has an identical entry, there will be no change to the dictionary. Otherwise, an entry for albatross is added to the dictionary.
The following JavaScript program uses the spell.addWord function to add an entry for albatross to the dictionary with URI /mySpell/Spell.json
:
const spell = require("/MarkLogic/spell.xqy"); declareUpdate(); spell.addWord("/mySpell/spell.json", "albatross");
If the /mySpell/spell.json
dictionary has an identical entry, there will be no change to the dictionary. Otherwise, an entry for albatross is added to the dictionary.
The following XQuery uses the spell:remove-word function to remove the entry for albatross dictionary with URI /mySpell/Spell.xml
:
xquery version "1.0-ml"; import module "http://marklogic.com/xdmp/spell" at "/MarkLogic/spell.xqy"; spell:remove-word("/mySpell/spell.xml", "albatross")
This removes the word albatross from the /mySpell/spell.xml
dictionary document.
The following JavaScript program uses the spell.removeWord function to remove the entry for albatross dictionary with URI /mySpell/Spell.json
:
const spell = require("/MarkLogic/spell.xqy"); declareUpdate(); spell.removeWord("/mySpell/spell.json", "albatross")
This removes the word albatross from the /mySpell/spell.json
dictionary document.
Dictionary documents are stored in XML or JSON format in the database. Therefore, they can be queried just like any other document. Note the following about security and dictionary documents:
xdmp:document-set-permissions
(JavaScript xdmp.documentSetPermissions) after a spell:load / spell.load operation.You can use the spell:is-correct XQuery function or the spell.isCorrect JavaScript function to see if a word is spelled correctly (according to the specified dictionary). Consider the following XQuery statement:
spell:is-correct("/mySpell/spell.xml", "alphabet")
This returns true because the word alphabet is spelled correctly. The following is the equivalent in JavaScript:
spell.isCorrect("/mySpell/spell.xml", "alphabet");
Now consider the following XQuery statement:
spell:is-correct("/mySpell/spell.xml", "alfabet")
This returns false
because the word alfabet is not spelled correctly. The following is the equivalent in JavaScript:
spell.isCorrect("/mySpell/spell.xml", "alfabet");
You can write a query which returns spelling suggestions based on words in the specified dictionary. Consider the following XQuery statement:
spell:suggest("/mySpell/spell.xml", "alfabet")
Or the equivalent JavaScript program:
spell.suggest("/mySpell/spell.xml", "alfabet");
This returns the following results:
alphabet albeit alphabets aloft abet alphabeted affable alphabet's alphabetic offbeat
The results are ranked in the order, where the first word is the one most likely to be the real spelling. Your application can then prompt the user if one of the suggested words was the actual word intended.
Now consider the following XQuery statement:
spell:suggest("/mySpell/spell.xml", "alphabet")
Or the equivalent JavaScript program:
spell.suggest("/mySpell/spell.xml", "alphabet");
This returns the empty sequence, indicating that the word is spelled correctly.
The spelling correction functions only provide suggestions for words that are less than 64 characters in length, and the functions only return suggestions that are less than 64 characters.