MarkLogic 9 Product Documentation
cdict:dictionary-write

cdict:dictionary-write(
   $lang as xs:string,
   $dict as element(cdict:dictionary),
   [$tokenization as xs:boolean]
) as empty-sequence()

Summary

Insert or update a custom dictionary for language.

Parameters
lang The ISO language code of the dictionary.
dict A custom dictionary. For details on the structure, see Custom Dictionary Format in the Search Developer's Guide.
tokenization Whether to insert the dictionary for use in tokenization or stemming. Set to true for tokenization, false for stemming. Default: false (stemming). This parameter is ignored for languages that use a single dictionary for both stemming and tokenization, such as Japanese and Chinese.

Required Privileges

This function requires the custom-dictionary-admin role or the following privileges:

http://marklogic.com/xdmp/privileges/custom-dictionary-admin

If your language configure uses user-defined lexer and/or stemmer plugins, you can define additional privileges for finer control. For details, see Custom Dictionary Security Considerations in the Search Developer's Guide.

Usage Notes

This function will validate the supplied dictionary. If the dictionary validates, it is installed on the cluster. If validation fails, validation errors are raised.

Any xml:lang attribute on the dictionary element is ignored. The lang parameter determines what language the dictionary is associated with.

When you configure a dictionary for a language, it is associated with the stemmer or lexer configured for the language. If you change the stemmer/lexer for the language, you will need to write the dictionary again.

Changes affecting stemming and tokenization take effect immediately. Queries started after a custom dictionary is written or deleted will use the new behavior.

Documents are not automatically reindexed after a custom dictionary change. To get accurate results for stemmed searches, documents must be reindexed. If it is not practical to reindex all documents, use this process to selectively reindex affected documents:

  1. Before updating the dictionary, record the words affected by the change. That is, the value of the word element of dictionary entries that are added, removed, or modified.
  2. Search for documents containing these words, and save the URIs.
  3. Update the custom dictionaries.
  4. Make an idempotent update to each of the documents found in Step 2. For example, you could add an element to each document and then delete it. Any change will cause each document to be reindexed.

See Also

Example

xquery version "1.0-ml";
import module namespace cdict = "http://marklogic.com/xdmp/custom-dictionary" 
  at "/MarkLogic/custom-dictionary.xqy";
  
let $dict :=
  <cdict:dictionary xmlns:cdict="http://marklogic.com/xdmp/custom-dictionary">
    <cdict:entry>
      <cdict:word>Furbies</cdict:word>
      <cdict:stem>Furby</cdict:stem>
    </cdict:entry>
    <cdict:entry>
      <cdict:word>servlets</cdict:word>
      <cdict:stem>servlet</cdict:stem>
    </cdict:entry>
  </cdict:dictionary>
return cdict:dictionary-write("en", $dict)
  
Powered by MarkLogic Server | Terms of Use | Privacy Policy