cts:entity-highlight( $node as node(), $expr as item()*, [$dict as cts:entity-dictionary] ) as node()
Returns a copy of the node, replacing any entities found
with the specified expression. You can use this function
to easily highlight any entities in an XML document in an arbitrary manner.
If you do not need fine-grained control of the XML markup returned,
you can use the entity:enrich
XQuery module function instead.
A valid entity enrichment license key is required
to use cts:entity-highlight
;
without a valid license key, it throws an exception. If you
have a valid license for entity enrichment, you can entity enrich text
in English and in any other languages for which you have a valid license
key. For languages in which you do not have a valid license key,
cts:entity-highlight
finds no entities for text in that
language.
In addition to a valid Entity Enrichment license key, this function requires that you have installed the Entity Enrichment package. For details on installing the Entity Enrichment package, see the Installation Guide and the "Marking Up Documents With Entity Enrichment" chapter of the Search Developer's Guide.
There are six built-in variables to represent an entity match. These variables can be used inline in the expression parameter.
$cts:text
asxs:string
The matched text.
$cts:node
astext()
The node containing the matched text.
$cts:start
asxs:integer
The string-length position of the first character of
$cts:text
in$cts:node
. Therefore, the following always returns true:fn:substring($cts:node, $cts:start, fn:string-length($cts:text)) eq $cts:text$cts:action
asxs:string
Use
xdmp:set
on this to specify what should happen next
- "continue"
- (default) Walk the next match. If there are no more matches, return all evaluation results.
- "skip"
- Skip walking any more matches and return all evaluation results.
- "break"
- Stop walking matches and return all evaluation results.
$cts:entity-type
asxs:string
The type of the matching entity.
$cts:normalized-text
asxs:string
The normalized entity text (only applicable for some languages).
The following are the entity types returned from the
$cts:entity-type
built-in variable (in alphabetical order):
FACILITY
- A place used as a facility.
GPE
- Geo-political entity. Differs from location because it has a person-made aspect to it (for example, California is a GPE because its boundaries were defined by a government).
IDENTIFIER:CREDIT_CARD_NUM
- A number identifying a credit card number.
IDENTIFIER:DISTANCE
- A number identifying a distance.
IDENTIFIER:EMAIL
- Identifies an email address.
IDENTIFIER:LATITUDE_LONGITUDE
- Latitude and longitude coordinates.
IDENTIFIER:MONEY
- Identifies currency (dollars, euros, and so on).
IDENTIFIER:NUMBER
- Identifies a number.
IDENTIFIER:PERSONAL_ID_NUM
- A number identifying a social security number or other ID number.
IDENTIFIER:PHONE_NUMBER
- A number identifying a telephone number.
IDENTIFIER:URL
- Identifies a web site address (URL).
IDENTIFIER:UTM
- Identifies Universal Transverse Mercator coordinates.
LOCATION
- A geographic location (Mount Everest, for example).
NATIONALITY
- The nationality of someone or something (for example, American).
ORGANIZATION
- An organization.
PERSON
- A person.
RELIGION
- A religion.
TEMPORAL:DATE
- Date-related.
TEMPORAL:TIME
- Time-related.
TITLE
- Appellation or honorific associated with a person.
URL
- A URL on the world wide web.
UTM
- A point in the Universal Transverse Mercator (UTM) coordinate system.
let $myxml := <node>George Washington never visited Norway. If he had a Social Security number, it might be 000-00-0001.</node> return cts:entity-highlight($myxml, element { fn:replace($cts:entity-type, ":", "-") } { $cts:text }) => <node> <PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>. If he had a Social Security number, it might be <IDENTIFIER-PERSONAL_ID_NUM>000-00-0001</IDENTIFIER-PERSONAL_ID_NUM>. </node>
Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.