
cts:entity-highlight( $node as node(), $expr as item()*, [$dict as cts:entity-dictionary] ) as node()
Find entities in a node and replace each matched entity with the result of evaluating an XQuery expression .
You can use this function to easily highlight entities in an XML document in an arbitrary manner. If you do not need fine-grained control of the XML markup returned, you can use the library function entity:enrich instead.
The following variables are available for use
inline in the expr parameter. These varibles make aspects
of the matched entity available to your inline code.
$cts:nodeastext()- The node containing the match.
$cts:textasxs:string- The matched text. In the case of overlapping matches, this value may not encompass the entirety of the entity match string. Rather, it contains only the non-overlapping part of the text, in order to prevent introduction of duplicate text in the final result.
$cts:entity-type asxs:string- The type of the matched entity, as defined by the
typefield of the matching entity dictionary entry.$cts:entity-id asxs:string- The ID of the matched entity, as defined by the
idfield of the matching entity dictionary entry.$cts:normalized-textasxs:string- The normalized entity text (only applicable for some languages).
$cts:startasxs:integer- The offset (in codepoints) of the start of
$cts:textin the matched text node.$cts:actionasxs:string- The action to take. Use
xdmp:seton this variable in your inline code to specify what should happen next. Set the value to one of the following values:
- "continue"
- Walk the next match. If there are no more matches, return all evaluation results. This is the default action.
- "skip"
- Skip walking any more matches and return all evaluation results.
- "break"
- Stop walking matches and return all evaluation results.
xquery version "1.0-ml";
let $dictionary := cts:entity-dictionary((
cts:entity("11208172", "Nixon", "Nixon", "person"),
cts:entity("11208172", "Nixon", "Richard Nixon", "person"),
cts:entity("11208172", "Nixon", "Richad M. Nixon", "person"),
cts:entity("11208172", "Nixon", "Richard Milhous Nixon", "person"),
cts:entity("11208172", "Nixon", "President Nixon", "person"),
cts:entity("08932568", "Paris", "Paris", "district:national capital"),
cts:entity("09145751", "Paris", "Paris", "district:town"),
cts:entity("09500217", "Paris", 'Paris', "mythical being")
))
let $input-xml := <node>Richard Nixon never visited Paris.</node>
return
cts:entity-highlight($input-xml,
(if ($cts:text ne "")
then element { fn:replace($cts:entity-type, ":| ", "-") } { $cts:text }
else ())
,$dictionary)
(: Returns output similar to the following. (Whitespace has been added
: here to improve readability.)
:
: <node>
: <person>Richard Nixon</person> never visited
: <district-national-capital>Paris</district-national-capital>.
: </node>
:)
Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.