Loading TOC...

MarkLogic 12 EA 2 Product Documentation
entity:dictionary-load

entity:dictionary-load(
   $path as xs:string,
   $uri as xs:string,
   [$options as xs:string*]
) as empty-sequence()

Summary

Load an entity dictionary from the filesystem into the database in the appropriate format.

Parameters
path The path to a text file containing the dictionary entries. For format details, see the Usage Notes.
uri The URI of the dictionary to be created.
options Options with which you can control the behavior of the entity dictionary. You can specify the following options. It is strongly recommended that you use the default option settings.
  • "case-sensitive" or "case-insensitive": Perform case-sensitive or case-insensitive matching of entities names. Specify one or the other. Default: "case-sensitive".
  • "remove-overlaps" or "allow-overlaps": Either eliminate entities with the overlapping names or allow them. Specify one or the other. Default: "allow-overlaps".
  • "whole-words" or "partial-words": Either require matches to align with token boundaries, or allow matches to fall within token boundaries. Specify one or the other. Default: "whole-words".

Usage Notes

The entity dictionary should be a text file containing one line per dictionary entry. Each line (entry) must consist of the following tab delimited fields, in the order shown: identifier, normalized text, matching text, entity type. For more details on the meaning of each field, see cts:entity.

See Also

Example

(:
Assuming "/data/example.txt" contains

11208172	Nixon	Nixon	person:head of state
11208172	Nixon	Richard Nixon	person:head of state
11208172	Nixon	Richard M. Nixon	person:head of state
11208172	Nixon	Richard Milhous Nixon	person:head of state
11208172	Nixon	President Nixon	person:head of state:person
08932568	Paris	Paris	administrative district:national capital
09145751	Paris	Paris	administrative district:town
09500217	Paris	Paris	imaginary being:mythical being

The URI "/ontology/people" will contain an entity dictionary with 
four entities (11208172 with 5 alternative matching texts, and the three 
entities 08932568, 09145751, and 09500217 with the same matching text).
:)
import module namespace entity="http://marklogic.com/entity"
  at "/MarkLogic/entity.xqy";

entity:dictionary-load("/data/example.txt","/ontology/people")
    

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.