MarkLogic 10 Product Documentation
entity:enrich

entity:enrich(
   $node as node(),
   [$dictionaries as cts:entity-dictionary*],
   [$options as xs:string*],
   [$map as map:map]
) as node()

Summary

Returns the entity-enriched XML for the given XML node using the provided dictionary. If a text node that is being enriched has a parent element with a schema definition that does not allow element children, then that text node is not enriched. Entity markup is not added within other entities.

Parameters
node The XML node to be enriched.
dictionaries The entity dictionaries to use to identify entities. If you do not specify any dictionaries, built-in entity dictionaries are applied. (Currently there are no built-in dictionaries.) If you specify multiple dictionaries, they are applied sequentially, in the order given.
options A list of options that control processing. The following options are available:
  • "full": Construct full entity markup, which includes the normalized text and entity ID. If omitted, the default is to emit minimal entities with just the entity type and matching text.
map A mapping from entity type to the QName of the element to use to create the entity. If no mapping is provided, the default mapping is used. Entities whose types are not matched in the map will not be emitted.

Usage Notes

Mapping can be done piecemeal. The type of an entity is split into segments, delimited by a colon, and then each segment is matched in turn against map until a match is found. If the value of a segment matches a mapping entry whose value is itself a map, then that submap is used to match subsequent segments of the type. If the submap contains no match, then the empty string is used to find a match within the submap.

The default map defines a set of default recommended entity names:

e:credit-card-number
A number representing a credit card number. (IDENTIFIER:CREDIT_CARD_NUM)
e:coordinate
Latitude and longitude coordinates. (IDENTIFIER:LATITUDE_LONGITUDE)
e:date
Date-related. (TEMPORAL:DATE)
e:distance
A distance. (IDENTIFIER:DISTANCE)
e:email
An email address. (IDENTIFIER:EMAIL)
e:gpe
A geo-political entity. Differs from location because it has a person-made aspect to it. For example, California is a GPE because its boundaries were defined by a government). (GPE)
e:facility
A place used as a facility. (FACILITY)
e:id
A number identifying a social security number or other ID number. (IDENTIFIER)
e:location
A geographic location (Mount Everest, for example). (LOCATION)
e:money
Currency, such as dollars, euros, and so on. (IDENTIFIER:MONEY)
e:nationality
The nationality of someone or something. For example, American. (NATIONALITY)
e:number
A number. (IDENTIFIER:NUMBER)
e:organization
An organization. (ORGANIZATION)
e:person
A person. (PERSON)
e:phone-number
A number identifying a telephone number. (IDENTIFIER:PHONE_NUMBER)
e:religion
A religion. (RELIGION)
e:title
A title or honorific. (TITLE)
e:url
A URL on the world wide web. (IDENTIFIER:URL)
e:utm
A point in the Universal Transverse Mercator (UTM) coordinate system. (IDENTIFIER:UTM)
e:time
Time-related. (TEMPORAL:TIME)

See Also

Example

(: Enrich content using the people ontology from the 
 : entity:dictionary-load example.
 :)

xquery version "1.0-ml";

import module namespace entity="http://marklogic.com/entity" 
    at "/MarkLogic/entity.xqy";

let $map :=
  map:new((
    map:entry("",xs:QName("entity:entity")),
    map:entry("administrative district",xs:QName("entity:gpe")),
    map:entry("facilty",xs:QName("entity:facility")),
    map:entry("person",xs:QName("entity:person")),
    map:entry("land",xs:QName("entity:location")),
    map:entry("location",xs:QName("entity:location")),
    map:entry("organization",xs:QName("entity:organization")),
    map:entry("region",xs:QName("entity:gpe"))
    ))
let $myxml := <node>Richard Nixon visited Paris.</node>
return
  entity:enrich($myxml, cts:entity-dictionary-get("/ontology/wordnet"), (), $map)

(: Returns the enriched node. For example:
 :
 : <node xmlns:entity="http://marklogic.com/entity">
 :   <entity:person>Richard Nixon</entity:person> visited <entity:gpe>Paris</entity:gpe>.
 : </node>
 :)
   

Example

(: Enrich content using the people ontology from the 
 : entity:dictionary-load example.
 :)
xquery version "1.0-ml";

import module namespace entity="http://marklogic.com/entity" 
    at "/MarkLogic/entity.xqy";

let $map :=
  map:new((
    map:entry("",xs:QName("entity:entity")),
    map:entry("administrative district",xs:QName("entity:gpe")),
    map:entry("facilty",xs:QName("entity:facility")),
    map:entry("person",xs:QName("entity:person")),
    map:entry("land",xs:QName("entity:location")),
    map:entry("location",xs:QName("entity:location")),
    map:entry("organization",xs:QName("entity:organization")),
    map:entry("region",xs:QName("entity:gpe"))
    ))
let $myxml := 
<message xmlns="URN:ietf:params:email-xml"                                      
         xmlns:rfc822="URN:ietf:params:rfc822:">                                
  <rfc822:subject>Paris Visit</rfc822:subject>
  <content>Richard Nixon visited Paris for secret talks.</content>
</message>
return
  entity:enrich($myxml, cts:entity-dictionary-get("/ontology/people"), (), $map)

(: Returns markup similar to following. Notice that the text inside
 : the rfc822:subject element is not enriched because the email schema
 : does not allow child elements in the subject.
 :
 : <message xmlns="URN:ietf:params:email-xml" 
 :          xmlns:entity="http://marklogic.com/entity" 
 :          xmlns:rfc822="URN:ietf:params:rfc822:">
 :   <rfc822:subject>Paris Visit</rfc822:subject>
 :   <content>
 :     <entity:person>Richard Nixon</entity:person> visited <entity:gpe>Paris</entity:gpe> for secret talks.
 :   </content>
 : </message>
 :)
  
Powered by MarkLogic Server | Terms of Use | Privacy Policy