MarkLogic 9 Product Documentation
cts.entityWalk

cts.entityWalk(
   node as Node,
   callback as function,
   [dict as cts.entityDictionary]
) as null

Summary

Walk an XML document or element node, evaluating a callback function against any matching entities. This function is similar to cts.entityHighligh in how it processes matched entities, but it differs in what it returns.

Parameters
node A node to walk. The node must be either an XML document node or an XML element node; it cannot be a text node.
callback A callback function that is invoked on each match. See the Usage Notes for details.
dict The entity dictionary to use for matching entities in the text of the input node. If you omit this parameter, the default entity dictionary is used. (No default dictionaries currently exist.) See the Usage Notes for details.

Usage Notes

The callback function can use variables in scope to accumulate results. The callback function returns an action string that specifies what happens next.

Your callback function must have the following signature:


function(text, node, entityType, entityId, normText, start)
      

Where the parameters have the following semantics:

text
A string containing the matched text. In the case of overlapping matches, this value may not encompass the entirety of the enity match string. Instead, it contains only the non-overlapping part of the text, to prevent introduction of duplicate text in the final result.
node
The text node containing the match.
entityType
A string containing the type of the entity, as defined in the type field of the matched cts.entity in the entity dictionary.
entityId
A string containing the ID of the entity, as defined in the id field of the matched entity in the entity dictionary.
normText
A string containing the normalized label of the entity, as defined in the normalized field of the matched entity in the entity dictionary.
start
The offset (in codepoints) of the start of text in the matched text node.

Your callback function should return one of the following values to indicate what should happen next:

See Also

Example

// Extract an entity list, in the form of JSON object nodes
'use strict';

// Construct the dictionary. Could also get it from the db.
const dictionary = cts.entityDictionary([
  cts.entity('11208172', 'Nixon', 'Nixon', 'person:head of state'),
  cts.entity('11208172', 'Nixon', 'Richard Nixon', 'person:head of state'),
  cts.entity('11208172', 'Nixon', 'Richard M. Nixon', 'person:head of state'),
  cts.entity('11208172', 'Nixon', 'Richard Milhous Nixon', 'person:head of state'),
  cts.entity('11208172', 'Nixon', 'President Nixon', 'person:head of state'),
  cts.entity('08932568', 'Paris', 'Paris', 'administrative district:national capital'),
  cts.entity('09145751', 'Paris', 'Paris', 'administrative district:town'),
  cts.entity('09500217', 'Paris', 'Paris', 'being:mythical being')
]);
// Construct <node>Nixon visited Paris</node>
const inputNode = new NodeBuilder()
                   .addElement('node', 'Richard Nixon visited Paris')
                   .toNode();
const resultBuilder = new NodeBuilder();
const results = [];
cts.entityWalk(inputNode,
  function(entityType, text, normText, entityId, node, start) {
    results.push({
      type: entityType,
      text: text,
      norm: normText,
      id: entityId,
      start: start
    });
  },
  dictionary);

results;

// Produces results similar to the following:
// [{"type":"person:head of state", 
//   "text":"Nixon", "normText":"Nixon", "id":"11208172", "start":1},
//  {"type":"administrative district:national capital", 
//   "text":"Paris", "normText":"Paris", "id":"08932568", "start":15},
//  {"type":"administrative district:town", 
//   "text":"Paris", "normText":"Paris", "id":"09145751", "start":15},
//  {"type":"imaginary being:mythical being", 
//   "text":"Paris", "normText":"Paris", "id":"09500217", "start":15}
// ]
  
Powered by MarkLogic Server | Terms of Use | Privacy Policy