
cts:entity-highlight( $node as node(), $expr as item()*, [$dict as cts:entity-dictionary] ) as node()
Returns a copy of the node, replacing any entities found
with the specified expression. You can use this function
to easily highlight any entities in an XML document in an arbitrary manner.
If you do not need fine-grained control of the XML markup returned,
you can use the entity:enrich XQuery module function instead.
A valid entity enrichment license key is required
to use cts:entity-highlight;
without a valid license key, it throws an exception. If you
have a valid license for entity enrichment, you can entity enrich text
in English and in any other languages for which you have a valid license
key. For languages in which you do not have a valid license key,
cts:entity-highlight finds no entities for text in that
language.
In addition to a valid Entity Enrichment license key, this function requires that you have installed the Entity Enrichment package. For details on installing the Entity Enrichment package, see the Installation Guide and the "Marking Up Documents With Entity Enrichment" chapter of the Search Developer's Guide.
There are six built-in variables to represent an entity match. These variables can be used inline in the expression parameter.
$cts:textasxs:stringThe matched text.
$cts:nodeastext()The node containing the matched text.
$cts:startasxs:integerThe string-length position of the first character of
$cts:textin$cts:node. Therefore, the following always returns true:fn:substring($cts:node, $cts:start, fn:string-length($cts:text)) eq $cts:text$cts:actionasxs:stringUse
xdmp:seton this to specify what should happen next
- "continue"
- (default) Walk the next match. If there are no more matches, return all evaluation results.
- "skip"
- Skip walking any more matches and return all evaluation results.
- "break"
- Stop walking matches and return all evaluation results.
$cts:entity-typeasxs:stringThe type of the matching entity.
$cts:normalized-textasxs:stringThe normalized entity text (only applicable for some languages).
The following are the entity types returned from the
$cts:entity-type built-in variable (in alphabetical order):
FACILITY- A place used as a facility.
GPE- Geo-political entity. Differs from location because it has a person-made aspect to it (for example, California is a GPE because its boundaries were defined by a government).
IDENTIFIER:CREDIT_CARD_NUM- A number identifying a credit card number.
IDENTIFIER:DISTANCE- A number identifying a distance.
IDENTIFIER:EMAIL- Identifies an email address.
IDENTIFIER:LATITUDE_LONGITUDE- Latitude and longitude coordinates.
IDENTIFIER:MONEY- Identifies currency (dollars, euros, and so on).
IDENTIFIER:NUMBER- Identifies a number.
IDENTIFIER:PERSONAL_ID_NUM- A number identifying a social security number or other ID number.
IDENTIFIER:PHONE_NUMBER- A number identifying a telephone number.
IDENTIFIER:URL- Identifies a web site address (URL).
IDENTIFIER:UTM- Identifies Universal Transverse Mercator coordinates.
LOCATION- A geographic location (Mount Everest, for example).
NATIONALITY- The nationality of someone or something (for example, American).
ORGANIZATION- An organization.
PERSON- A person.
RELIGION- A religion.
TEMPORAL:DATE- Date-related.
TEMPORAL:TIME- Time-related.
TITLE- Appellation or honorific associated with a person.
URL- A URL on the world wide web.
UTM- A point in the Universal Transverse Mercator (UTM) coordinate system.
let $myxml := <node>George Washington never visited Norway.
If he had a Social Security number,
it might be 000-00-0001.</node>
return
cts:entity-highlight($myxml,
element { fn:replace($cts:entity-type, ":", "-") } { $cts:text })
=>
<node>
<PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>.
If he had a Social Security number, it might be
<IDENTIFIER-PERSONAL_ID_NUM>000-00-0001</IDENTIFIER-PERSONAL_ID_NUM>.
</node>
Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.