MarkLogic 9 Product Documentation
cts:entity-walk

cts:entity-walk(
   $node as node(),
   $expr as item()*,
   [$dict as cts:entity-dictionary]
) as item()*

Summary

Walk an XML document or element node, evaluating an expression against any matching entities. This function is similar to cts:entity-highlight in how it processes matched entities, but it differs in what it returns.

Parameters
node A node to walk. The node must be either an XML document node or an XML element node; it cannot be a text node.
$expr An expression to evaluate for each match. You can use the variables $cts:text, $cts:node, $cts:entity-type, $cts:normalized-text, $cts:entity-id, $cts:start, and $cts:action in the expression. See the Usage Notes for details.
dict The entity dictionary to use for matching entities in the text of the input node. If you omit this parameter, the default entity dictionary is used. (No default dictionaries currently exist.) See the Usage Notes for details.

Usage Notes

The following variables are available for use inline in the expr parameter. These varibles make aspects of the matched entity available to your inline code.

$cts:node as text()
The node containing the match.
$cts:text as xs:string
The matched text. In the case of overlapping matches, this value may not encompass the entirety of the entity match string. Rather, it contains only the non-overlapping part of the text, in order to prevent introduction of duplicate text in the final result.
$cts:entity-type as xs:string
The type of the matched entity, as defined by the type field of the matching entity dictionary entry.
$cts:entity-id as xs:string
The ID of the matched entity, as defined by the id field of the matching entity dictionary entry.
$cts:normalized-text as xs:string
The normalized entity text (only applicable to some languages).
$cts:start as xs:integer
The offset (in codepoints) of the start of $cts:text in the matched text node.
$cts:action as xs:string
The action to take. Use xdmp:set on this variable in your inline code to specify what should happen next. Use xdmp:set to set the value to one of the following:
"continue"
Walk the next match. If there are no more matches, return all evaluation results. This is the default action.
"skip"
Skip walking any more matches and return all evaluation results.
"break"
Stop walking matches and return all evaluation results.

See Also

Example

xquery version "1.0-ml";

(: NOTE: The fields of each line below must be TAB separated. :)
let $dictionary := 
  cts:entity-dictionary-parse(
"11208172	Nixon	Nixon	person:head of state
11208172	Nixon	Richard Nixon	person:head of state
11208172	Nixon	Richard M. Nixon	person:head of state
11208172	Nixon	Richard Milhous Nixon	person:head of state
11208172	Nixon	President Nixon	person:head of state
08932568	Paris	Paris	administrative district:national capital
09145751	Paris	Paris	administrative district:town
09500217	Paris	Paris	imaginary being:mythical being
"
)
let $input-node := <node>Nixon visited Paris</node>
return cts:entity-walk($input-node, 
  (object-node {
     "type": $cts:entity-type,
     "text": $cts:text,
     "normText": $cts:normalized-text,
     "id": $cts:entity-id,
     "start": $cts:start
  }), $dictionary)

(: Produces output similar to the following:
 : { "type":"person:head of state", 
 :   "text":"Nixon", "normText":"Nixon", "id":"11208172", "start":1}
 : { "type":"administrative district:national capital", 
 :   "text":"Paris", "normText":"Paris", "id":"08932568", "start":15}
 : { "type":"administrative district:town", 
 :   "text":"Paris", "normText":"Paris", "id":"09145751", "start":15}
 : { "type":"imaginary being:mythical being", 
 :   "text":"Paris", "normText":"Paris", "id":"09500217", "start":15}
 :)
  
Powered by MarkLogic Server | Terms of Use | Privacy Policy