MarkLogic Server 11.0 Product Documentation
entity:extract

entity:extract(
   $node as node(),
   [$dictionaries as cts:entity-dictionary*],
   [$options as xs:string*],
   [$map as map:map]
) as element()*

Summary

Extract entities from a document using the provided entity dictionary. The matching entities are returned.

Parameters
node The node from which to extract entities. The node must be an XML document node or an XML element node.
dictionaries The entity dictionaries to use to extract entities. If you do not specify any dictionaries, built-in entity dictionaries are applied. (Currently there are no built-in dictionaries.) If you specify multiple dictionaries, they are applied sequentially, in the order given.
options A list of options that control processing. The options are:
  • "full": Emit full entities, which include the normalized text, entity ID, path to text node, starting offset within that text node. If omitted, the default is to emit minimal entities with just the entity type and matching text.
map A mapping from entity type to the QName of the element to use to create the entity. If you do not provide a mapping, the default mapping is used. Entities whose types are not matched in the map will not be emitted.

Usage Notes

Mapping can be done piecemeal. The type of an entity is split into segments, delimited by a colon, and then each segment is matched in turn against map until a match is found. If the value of a segment matches a mapping entry whose value is itself a map, then that submap is used to match subsequent segments of the type. If the submap contains no match, then the empty string is used to find a match within the submap.

The default map defines a set of default recommended entity names:

e:credit-card-number
A number representing a credit card number. (IDENTIFIER:CREDIT_CARD_NUM)
e:coordinate
Latitude and longitude coordinates. (IDENTIFIER:LATITUDE_LONGITUDE)
e:date
Date-related. (TEMPORAL:DATE)
e:distance
A distance. (IDENTIFIER:DISTANCE)
e:email
An email address. (IDENTIFIER:EMAIL)
e:gpe
Geo-political entity. Differs from location in that it has a person-made aspect to it. For example, California is a GPE because its boundaries were defined by a government). (GPE)
e:facility
A place used as a facility. (FACILITY)
e:id
A number identifying a social security number or other ID number. (IDENTIFIER)
e:location
A geographic location. For example, Mount Everest. (LOCATION)
e:money
Currency, such as dollars, euros, and so on. (IDENTIFIER:MONEY)
e:nationality
The nationality of someone or something. For example, American. (NATIONALITY)
e:number
A number. (IDENTIFIER:NUMBER)
e:organization
An organization. (ORGANIZATION)
e:person
A person. (PERSON)
e:phone-number
A number identifying a telephone number. (IDENTIFIER:PHONE_NUMBER)
e:religion
A religion. (RELIGION)
e:title
A title or honorific. (TITLE)
e:url
A URL on the world wide web. (IDENTIFIER:URL)
e:utm
A point in the Universal Transverse Mercator (UTM) coordinate system. (IDENTIFIER:UTM)
e:time
Time-related. (TEMPORAL:TIME)

See Also

Example

(: extract entities, using the people ontology from the 
 : entity:dictionary-load example
 :)
import module namespace entity="http://marklogic.com/entity"
  at "/MarkLogic/entity.xqy";

let $map :=
  map:new((
    map:entry("",xs:QName("entity:entity")),
    map:entry("administrative district",xs:QName("entity:gpe")),
    map:entry("facilty",xs:QName("entity:facility")),
    map:entry("person",xs:QName("entity:person")),
    map:entry("land",xs:QName("entity:location")),
    map:entry("location",xs:QName("entity:location")),
    map:entry("organization",xs:QName("entity:organization")),
    map:entry("region",xs:QName("entity:gpe"))
  ))
return
  entity:extract(<p>Nixon went to Paris</p>,
    cts:entity-dictionary-get("/ontology/people"), (), $map)

(: Returns the extracted entity(s). For example:
 :
 : <entity:person xmlns:entity="http://marklogic.com/entity">Nixon</entity:person>
 : <entity:gpe xmlns:entity="http://marklogic.com/entity">Paris</entity:gpe>
 : <entity:gpe xmlns:entity="http://marklogic.com/entity">Paris</entity:gpe>
 : <entity:entity type="imaginary being:mythical being" xmlns:entity="http://marklogic.com/entity">Paris</entity:entity>
 :)
    
Powered by MarkLogic Server | Terms of Use | Privacy Policy