
cts:entity-walk( $node as node(), $expr as item()*, [$dict as cts:entity-dictionary] ) as item()*
Walk an XML document or element node, evaluating an
expression
against any matching entities. This function is similar to
cts:entity-highlight
in how it processes matched entities, but it differs in what it returns.
The following variables are available for use
inline in the expr parameter. These varibles make aspects
of the matched entity available to your inline code.
$cts:nodeastext()- The node containing the match.
$cts:textasxs:string- The matched text. In the case of overlapping matches, this value may not encompass the entirety of the entity match string. Rather, it contains only the non-overlapping part of the text, in order to prevent introduction of duplicate text in the final result.
$cts:entity-typeasxs:string- The type of the matched entity, as defined by the
typefield of the matching entity dictionary entry.$cts:entity-idasxs:string- The ID of the matched entity, as defined by the
idfield of the matching entity dictionary entry.$cts:normalized-textasxs:string- The normalized entity text (only applicable to some languages).
$cts:startasxs:integer- The offset (in codepoints) of the start of
$cts:textin the matched text node.$cts:actionasxs:string- The action to take. Use
xdmp:seton this variable in your inline code to specify what should happen next. Usexdmp:setto set the value to one of the following:
- "continue"
- Walk the next match. If there are no more matches, return all evaluation results. This is the default action.
- "skip"
- Skip walking any more matches and return all evaluation results.
- "break"
- Stop walking matches and return all evaluation results.
xquery version "1.0-ml";
(: NOTE: The fields of each line below must be TAB separated. :)
let $dictionary :=
cts:entity-dictionary-parse(
"11208172 Nixon Nixon person:head of state
11208172 Nixon Richard Nixon person:head of state
11208172 Nixon Richard M. Nixon person:head of state
11208172 Nixon Richard Milhous Nixon person:head of state
11208172 Nixon President Nixon person:head of state
08932568 Paris Paris administrative district:national capital
09145751 Paris Paris administrative district:town
09500217 Paris Paris imaginary being:mythical being
"
)
let $input-node := <node>Nixon visited Paris</node>
return cts:entity-walk($input-node,
(object-node {
"type": $cts:entity-type,
"text": $cts:text,
"normText": $cts:normalized-text,
"id": $cts:entity-id,
"start": $cts:start
}), $dictionary)
(: Produces output similar to the following:
: { "type":"person:head of state",
: "text":"Nixon", "normText":"Nixon", "id":"11208172", "start":1}
: { "type":"administrative district:national capital",
: "text":"Paris", "normText":"Paris", "id":"08932568", "start":15}
: { "type":"administrative district:town",
: "text":"Paris", "normText":"Paris", "id":"09145751", "start":15}
: { "type":"imaginary being:mythical being",
: "text":"Paris", "normText":"Paris", "id":"09500217", "start":15}
:)