Loading TOC...

cts:entity-dictionary-parse

cts:entity-dictionary-parse(
   $contents as xs:string*,
   [$options as xs:string*]
) as cts:entity-dictionary

Summary

Construct a cts:entity-dictionary object by parsing it from a formatted string.

Parameters
$contents The dictionary entries to parse. Each line (or string) must consist of four tab-delimited fields: The entity ID, the normalized form of the entity, the word or phrase to match during entity identification, and the entity type. For more details about the fields, see cts:entity. Multiple formatted strings can be passed in and they will be combined into a single dictionary object.
$options Options with which you can control the behavior of the entity dictionary. You can specify the following options. It is strongly recommended that you use the default option settings.
  • "case-sensitive" or "case-insensitive": Perform case-sensitive or case-insensitive matching of entities names. Specify one or the other. Default: "case-sensitive".
  • "remove-overlaps" or "allow-overlaps": Either eliminate entities with the overlapping names or allow them. Specify one or the other. Default: "allow-overlaps".
  • "whole-words" or "partial-words": Either require matches to align with token boundares, or allow matches to fall within token boundaries. Specify one or the other. Default: "whole-words".

See Also

Example

xquery version "1.0-ml";
import module namespace entity="http://marklogic.com/entity"
  at "/MarkLogic/entity.xqy";

(: NOTE: The fields in each line below must be TAB separated. :)
let $dictionary := cts:entity-dictionary-parse(
"11208172	Nixon	Nixon	person:head of state
11208172	Nixon	Richard Nixon	person:head of state
11208172	Nixon	Richard M. Nixon	person:head of state
11208172	Nixon	Richard Milhous Nixon	person:head of state
11208172	Nixon	President Nixon	person:head of state
08932568	Paris	Paris	administrative district:national capital
09145751	Paris	Paris	administrative district:town
09500217	Paris	Paris	imaginary being:mythical being
")
let $input-xml := <node>Nixon visited Paris</node>
return entity:enrich($input-xml, $dictionary)

(: Returns output similar to the following. (Whitespace added to improve
 : readability.)
 :
 : <node xmlns:e="http://marklogic.com/entity">
 :   <e:entity type="person:head of state">Nixon</e:entity> 
 :   visited 
 :   <e:entity type="administrative district:national capital">Paris</e:entity>
 : </node>
 :)

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.