MarkLogic 9 Product Documentation
cts:entity-dictionary-parse

cts:entity-dictionary-parse(
   $contents as xs:string*,
   [$options as xs:string*]
) as cts:entity-dictionary

Summary

Construct a cts:entity-dictionary object by parsing it from a formatted string.

Parameters
contents The dictionary entries to parse. Each line (or string) must consist of four tab-delimited fields: The entity ID, the normalized form of the entity, the word or phrase to match during entity identification, and the entity type. For more details about the fields, see cts:entity. Multiple formatted strings can be passed in and they will be combined into a single dictionary object.
options Options with which you can control the behavior of the entity dictionary. You can specify the following options. It is strongly recommended that you use the default option settings.
  • "case-sensitive" or "case-insensitive": Perform case-sensitive or case-insensitive matching of entities names. Specify one or the other. Default: "case-sensitive".
  • "remove-overlaps" or "allow-overlaps": Either eliminate entities with the overlapping names or allow them. Specify one or the other. Default: "allow-overlaps".
  • "whole-words" or "partial-words": Either require matches to align with token boundares, or allow matches to fall within token boundaries. Specify one or the other. Default: "whole-words".

See Also

Example

xquery version "1.0-ml";
import module namespace entity="http://marklogic.com/entity"
  at "/MarkLogic/entity.xqy";

(: NOTE: The fields in each line below must be TAB separated. :)
let $dictionary := cts:entity-dictionary-parse(
"11208172	Nixon	Nixon	person:head of state
11208172	Nixon	Richard Nixon	person:head of state
11208172	Nixon	Richard M. Nixon	person:head of state
11208172	Nixon	Richard Milhous Nixon	person:head of state
11208172	Nixon	President Nixon	person:head of state
08932568	Paris	Paris	administrative district:national capital
09145751	Paris	Paris	administrative district:town
09500217	Paris	Paris	imaginary being:mythical being
")
let $input-xml := <node>Nixon visited Paris</node>
return entity:enrich($input-xml, $dictionary)

(: Returns output similar to the following. (Whitespace added to improve
 : readability.)
 :
 : <node xmlns:e="http://marklogic.com/entity">
 :   <e:entity type="person:head of state">Nixon</e:entity> 
 :   visited 
 :   <e:entity type="administrative district:national capital">Paris</e:entity>
 : </node>
 :)
Powered by MarkLogic Server | Terms of Use | Privacy Policy