Loading TOC...

cts.entityHighlight

cts.entityHighlight(
   node as Node,
   callback as function,
   builder as NodeBuilder,
   [dict as cts.entityDictionary]
) as null

Summary

Find entities in a node and replace each matched entity with the result returned by a callback function.

Parameters
node A node to run entity highlight on. The node must be either a document node or an element node; it cannot be a text node.
callback A callback function that is invoked on each match. See the Usage Notes for details.
builder A NodeBuilder object that will be used to construct the highlighted copy of the input node.
dict The entity dictionary to use for matching entities in the text of the input node. If you omit this parameter, the default entity dictionary is used. (No default dictionaries currently exist.) See the Usage Notes for details.

Usage Notes

You can use this function to easily highlight entities in an XML document in an arbitrary manner. If you do not need fine-grained control of the XML markup returned, you can use the library function entity.enrich instead.

When this function returns, the builder contains the highlighted node. You can extract it using NodeBuilder.toNode.

Your callback function must have the following signature:


function(builder, entityType, text, normText, entityId, node, start)
    

Where the parameters have the following semantics:

builder
The NodeBuilder object used to build the highlighted node copy. Anything you add to builder is added to the final result. This is the same builder you pass in as the builder parameter of cts.entityHighlight
entityType
A string containing the type of the entity, as defined in the type field of the matched cts.entity in the entity dictionary.
text
A string containing the matched text. In the case of overlapping matches, this value may not encompass the entirety of the enity match string. Instead, it contains only the non-overlapping part of the text, to prevent introduction of duplicate text in the final result.
normText
A string containing the normalized label of the entity, as defined in the normalized field of the matched entity in the entity dictionary.
entityId
A string containing the ID of the entity, as defined in the id field of the matched entity in the entity dictionary.
node
The text node containing the match.
start
The offset (in codepoints) of the start of text in the matched text node.

Your callback function should return one of the following values to indicate what should happen next:

  • 'continue': Proceed with the next match. (Default)
  • 'skip': Skip walking any more matches and return all builder results.
  • 'break': Stop walking matches and return all builder results.
  • null: Continue with the previous action.

See Also

Example

'use strict';
const dictionary = cts.entityDictionary([
  cts.entity('11208172', 'Nixon', 'Nixon', 'person'),
  cts.entity('11208172', 'Nixon', 'Richard Nixon', 'person'),
  cts.entity('11208172', 'Nixon', 'Richard M. Nixon', 'person'),
  cts.entity('11208172', 'Nixon', 'Richard Milhous Nixon', 'person'),
  cts.entity('11208172', 'Nixon', 'President Nixon', 'person'),
  cts.entity('08932568', 'Paris', 'Paris', 'district:national capital'),
  cts.entity('09145751', 'Paris', 'Paris', 'district:town')
]);
const inputNode = new NodeBuilder()
                   .addElement('node', 'Richard Nixon never visited Paris.')
                   .toNode();
const resultBuilder = new NodeBuilder();
cts.entityHighlight(inputNode,
  function(builder, entityType, text, normText, entityId, node, start) {
    if (text != '') {
      builder.addElement(fn.replace(entityType, ':| ', '-'), text);
    } 
  },
  resultBuilder, dictionary);
resultBuilder.toNode();

// Returns output similar to the following. (Whitespace has been added
// here to improve readability.)
// 
// <node>
//   <person>Richard Nixon</person> never visited 
//   <district-national-capital>Paris</district-national-capital>.
// </node>

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.