Loading TOC...

MarkLogic 12 EA 2 Product Documentation
cts:entity-highlight

cts:entity-highlight(
   $node as node(),
   $expr as item()*,
   [$dict as cts:entity-dictionary]
) as node()

Summary

Returns a copy of the node, replacing any entities found with the specified expression. You can use this function to easily highlight any entities in an XML document in an arbitrary manner. If you do not need fine-grained control of the XML markup returned, you can use the entity:enrich XQuery module function instead. A valid entity enrichment license key is required to use cts:entity-highlight; without a valid license key, it throws an exception. If you have a valid license for entity enrichment, you can entity enrich text in English and in any other languages for which you have a valid license key. For languages in which you do not have a valid license key, cts:entity-highlight finds no entities for text in that language.

Parameters
node A node to run entity highlight on. The node must be either a document node or an element node; it cannot be a text node.
expr An expression with which to replace each match. You can use the variables $cts:text, $cts:node, $cts:entity-type and $cts:normalized-text, $cts:start, and $cts:action (described below) in the expression.
dict The entity dictionary to use for matching entities in the text of the input node. If you omit this parameter, the default entity dictionary is used. (No default dictionaries currently exist.) See the Usage Notes for details.

Usage Notes

In addition to a valid Entity Enrichment license key, this function requires that you have installed the Entity Enrichment package. For details on installing the Entity Enrichment package, see the Installation Guide and the "Marking Up Documents With Entity Enrichment" chapter of the Search Developer's Guide.

There are six built-in variables to represent an entity match. These variables can be used inline in the expression parameter.

$cts:text as xs:string

The matched text.

$cts:node as text()

The node containing the matched text.

$cts:start as xs:integer

The string-length position of the first character of $cts:text in $cts:node. Therefore, the following always returns true:

fn:substring($cts:node, $cts:start,
             fn:string-length($cts:text)) eq $cts:text 
$cts:action as xs:string

Use xdmp:set on this to specify what should happen next

"continue"
(default) Walk the next match. If there are no more matches, return all evaluation results.
"skip"
Skip walking any more matches and return all evaluation results.
"break"
Stop walking matches and return all evaluation results.

$cts:entity-type as xs:string

The type of the matching entity.

$cts:normalized-text as xs:string

The normalized entity text (only applicable for some languages).

The following are the entity types returned from the $cts:entity-type built-in variable (in alphabetical order):

FACILITY
A place used as a facility.
GPE
Geo-political entity. Differs from location because it has a person-made aspect to it (for example, California is a GPE because its boundaries were defined by a government).
IDENTIFIER:CREDIT_CARD_NUM
A number identifying a credit card number.
IDENTIFIER:DISTANCE
A number identifying a distance.
IDENTIFIER:EMAIL
Identifies an email address.
IDENTIFIER:LATITUDE_LONGITUDE
Latitude and longitude coordinates.
IDENTIFIER:MONEY
Identifies currency (dollars, euros, and so on).
IDENTIFIER:NUMBER
Identifies a number.
IDENTIFIER:PERSONAL_ID_NUM
A number identifying a social security number or other ID number.
IDENTIFIER:PHONE_NUMBER
A number identifying a telephone number.
IDENTIFIER:URL
Identifies a web site address (URL).
IDENTIFIER:UTM
Identifies Universal Transverse Mercator coordinates.
LOCATION
A geographic location (Mount Everest, for example).
NATIONALITY
The nationality of someone or something (for example, American).
ORGANIZATION
An organization.
PERSON
A person.
RELIGION
A religion.
TEMPORAL:DATE
Date-related.
TEMPORAL:TIME
Time-related.
TITLE
Appellation or honorific associated with a person.
URL
A URL on the world wide web.
UTM
A point in the Universal Transverse Mercator (UTM) coordinate system.

Example

let $myxml := <node>George Washington never visited Norway.
              If he had a Social Security number,
              it might be 000-00-0001.</node>
return
cts:entity-highlight($myxml,
   element { fn:replace($cts:entity-type, ":", "-") } { $cts:text })

=>
<node>
  <PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>.
  If he had a Social Security number, it might be
  <IDENTIFIER-PERSONAL_ID_NUM>000-00-0001</IDENTIFIER-PERSONAL_ID_NUM>.
</node>

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.