This chapter walks through a very simple Entity Services example of creating a model, creating entity instances from source data, and querying the model and instances. Choose either the XQuery walkthrough or the Server-Side JavaScript walkthrough.
All the exercises in this section use the Query Console browser application to evaluate code on MarkLogic Server. You can launch Query Console by navigating to port 8000 of a host running MarkLogic.
For example, if MarkLogic is installed on localhost, launch Query Console by opening the following location in your browser:
http://localhost:8000
To use Query Console, you must have the qconsole-user
role or equivalent privileges. You can learn more about Query Console in the Query Console User Guide.
You do not require special security privileges to use the Entity Services API. However, some exercises in this chapter involve deploying application code to MarkLogic, so you should log into Query Console as a user with the admin role or equivalent privileges.
Some exercises in this chapter save generated code and configuration artifacts to the local filesystem on the host where MarkLogic is installed, and later read them back. You can choose any directory, but the directory must be readable and writeable by MarkLogic and by you. The examples use the variable ARTIFACT_DIR to represent this directory in the instructions.
You can use any database for the exercises in this chapter. However, if you would like to isolate this work from the rest of your environment, you can use the procedure in this section to create a new content database named es-gs, with one forest of the same name attached to it.
The following procedure uses the XQuery Admin API to create a database and a forest, and then attach the forest to the database. You could also use the Admin Interface or the REST Management API.
http://localhost:8000/qconsole
(: create a database:) xquery version "1.0-ml"; import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy"; admin:save-configuration( admin:database-create(admin:get-configuration(), "es-gs", xdmp:database("Security"), xdmp:database("Schemas"))); (: create a forest :) xquery version "1.0-ml"; import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy"; admin:save-configuration( admin:forest-create(admin:get-configuration(), "es-gs", xdmp:host(), ())); (: attach the forest to the database :) xquery version "1.0-ml"; import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy"; admin:save-configuration( admin:database-attach-forest(admin:get-configuration(), xdmp:database("es-gs"), xdmp:forest("es-gs")));
This section uses XQuery and XML to introduce the Entity Services APIs. If you prefer to use Server-Side JavaScript, see Getting Started Using JavaScript. You can also use JSON with XQuery and XML with JavaScript, but these combinations are not illustrated here.
This exercise ingests the raw source data from which we will create entity instances. One benefit of Entity Services is that you do not have to model your data up front. You can load your data as-is and use it in your application, and then incrementally model your entities.
You usually create entity instances from XML or JSON data. The raw data in this example is 2 XML documents and a JSON document. Each document contains information about a person, such as first name and last name. Each person document also includes a unique persond identifier.
Use the following procedure to load the raw source documents into your content database. The newly created documents are put into a collection named raw so we can easily reference them later.
http://localhost:8000/qconsole
(: Stage raw source in the form of 2 XML and 1 JSON document :) xquery version "1.0-ml"; import module namespace es = "http://marklogic.com/entity-services" at "/MarkLogic/entity-services/entity-services.xqy"; (: Synthesize source data in memory. Normally, this would come : from an external source. :) let $source-data := ( <person> <pid>1234</pid> <given>George</given> <family>Washington</family> </person>, xdmp:unquote(' {"pid": 2345, "given": "Martha", "family": "Washington"} ')/node(), <person> <pid>3456</pid> <given>Alexander</given> <family>Hamilton</family> </person> ) for $source in $source-data return let $uri-suffix := typeswitch ($source) case element() return ".xml" case object-node() return ".json" default return () return xdmp:document-insert( fn:concat('/es-gs/raw/', $source/pid, $uri-suffix), $source, <options xmlns="xdmp:document-insert"> <collections> <collection>raw</collection> </collections> </options> )
/es-gs/raw/1234.xml /es-gs/raw/2345.json /es-gs/raw/3456.xml
You define the entity types, attributes, and relationships of your model in an XML or JSON model descriptor. The model descriptor is the foundation for the model. Model descriptors are discussed in detail in Creating and Managing Models.
The model descriptor in this example is based on the Person
example from the Entity Services examples on GitHub. For more details about the original example, see Exploring the Entity Services Open-Source Examples.
This exercise saves an XML model descriptor as a file on the filesystem. Discussion of the descriptor follows the procedure. For an equivalent JSON example, see Create a Model Descriptor.
person-desc.xml
in ARTIFACT_DIR with the following contents.<es:model xmlns:es="http://marklogic.com/entity-services"> <es:info> <es:title>Person</es:title> <es:version>1.0.0</es:version> <es:base-uri>http://example.org/example-person/</es:base-uri> <es:description> A model of a person, to demonstrate several extractions </es:description> </es:info> <es:definitions> <Person> <es:properties> <id><es:datatype>string</es:datatype></id> <firstName><es:datatype>string</es:datatype></firstName> <lastName><es:datatype>string</es:datatype></lastName> <fullName><es:datatype>string</es:datatype></fullName> <friends> <es:datatype>array</es:datatype> <es:items><es:ref>#/definitions/Person</es:ref></es:items> </friends> </es:properties> <es:primary-key>id</es:primary-key> <es:required>firstName</es:required> <es:required>lastName</es:required> <es:required>fullName</es:required> </Person> </es:definitions> </es:model>
You now have a file named ARTIFACT_DIR/person-desc.xml
that contains the Person
model descriptor.
We stored the model on the filesystem because this most closely resembles a real development cycle, in which an important project artificat like the model descriptor is under source control.
The descriptor defines a single entity type named Person
. A Person
entity instance contains string-valued properties named id
, firstName
, lastName
, fullName
and a list-valued property named friends
.
<Person> <es:properties> <id><es:datatype>string</es:datatype></id> <firstName><es:datatype>string</es:datatype></firstName> <lastName><es:datatype>string</es:datatype></lastName> <fullName><es:datatype>string</es:datatype></fullName> <friends> <es:datatype>array</es:datatype> <es:items><es:ref>#/definitions/Person</es:ref></es:items> </friends> ...
The friends
property is a list (array) of references to other Person
entities. Since the reference to Person
appears in the same descriptor in which Person
is defined, it is a local reference. Entity Services knows the shape of the referenced entity type when generating code from a Person
model. You can also reference entity types defined elsewhere.
The firstName
, lastName
, and fullName
properties must all be present in every Person
entity instance because these properties are explicitly flagged as required through the use of <es:required/>
:
<es:required>firstName</es:required> <es:required>lastName</es:required> <es:required>fullName</es:required>
The id
property is implicitly required because it is identified as the primary key for a Person
:
<es:primary-key>id</es:primary-key>
The primary key is a unique identifier for an entity instance. You are not required to define a primary key, but the existence of a primary key facilitates other Entity Services features; for details, see Identifying the Primary Key Entity Property.
Since the friends
property is neither a primary key nor an explicitly required property, it is optional. That is, you can create entities that do not include a friends
property.
You can also flag properties with other characteristics, such as whether or not a property should be indexed for efficient search. For more details, see Writing a Model Descriptor.
Inserting an XML or JSON model descriptor document into the special collection http://marklogic.com/entity-services/models
tells MarkLogic the document is part of an Entity Services model. Membership in this collection causes MarkLogic to generate semantic triples that define the model.
We authored a model descriptor in Create a Model Descriptor. The following procedure covers the validation and persistence steps that create the model. An explanation of the code follows the procedure.
(: Create a model. :) xquery version "1.0-ml"; import module namespace es = "http://marklogic.com/entity-services" at "/MarkLogic/entity-services/entity-services.xqy"; let $ARTIFACT_DIR := '/space/es/gs/' let $desc := xdmp:document-get( fn:concat($ARTIFACT_DIR, 'person-desc.xml')) let $validated-desc := es:model-validate($desc) let $desc-as-json := xdmp:to-json($validated-desc) return xdmp:document-insert( '/es-gs/models/person-1.0.0.json', $desc-as-json, <options xmlns="xdmp:document-insert"> <collections>{ <collection>http://marklogic.com/entity-services/models</collection>, for $coll in xdmp:default-collections() return <collection>{$coll}</collection> }</collections> </options> )
/es-gs/models/person-1.0.0.json
.If the query is unable to open the input model descriptor file, check the permissions on the directory and file.
The first step is to validate the descriptor. An invalid descriptor will produce an invalid model. Validation introduces overhead, but an invalid descriptor will produce an invalid model, so validation is recommended during development.
let $desc := xdmp:document-get( fn:concat($ARTIFACT_DIR, 'person-desc.xml')) let $validated-desc := es:model-validate($desc)
The function es:model-validate returns a json:object representation of the descriptor. A json:object is a special kind of map:map. This is the form expected by Entity Services API functions that operate on the model, but it is not the proper form for creating a model. Instead, you must persist an XML or JSON descriptor.
If you persist a descriptor as XML, then you must use es:model-validate or es:model-from-xml to convert it to the map:map form if you extract it from the database to pass to an Entity Services function. If you persist the descriptor as JSON, then subsequent conversion is not necessary. Therefore, this example persists a JSON version of the original XML descriptor.
The function xdmp:to-json converts the json:object created by es:model-validate into a JSON object-node that represents the JSON version of our XML descriptor. For example:
let $desc-as-json := xdmp:to-json($validated-desc)
Finally, we insert the descriptor into the database as part of the special Entity Services collection to create the model. The following document insertion adds the Entity Services collection to any default collections associated with the user performing the insertion.
xdmp:document-insert( '/es-gs/models/person-1.0.0.json', $model-as-json, <options xmlns="xdmp:document-insert"> <collections>{ <collection>http://marklogic.com/entity-services/models</collection>, for $coll in xdmp:default-collections() return <collection>{$coll}</collection> }</collections> </options> )
An instance converter is a library module containing code for transforming your raw source data into entity instances that conform to your model. You can use the Entity Services API to generate a baseline converter, and then customize it to meet the requirements of your application.
This section walks through deploying a converter module in the following steps:
This exercise creates an instance converter module template using the es:instance-converter-generate function. An explanation of the code follows the procedure.
(: Create an instance converter and save it to a file :) xquery version "1.0-ml"; import module namespace es = "http://marklogic.com/entity-services" at "/MarkLogic/entity-services/entity-services.xqy"; let $desc := fn:doc('/es-gs/models/person-1.0.0.json') let $ARTIFACT_DIR := '/space/es/gs/' (: MODIFY THIS VALUE :) return xdmp:save( fn:concat($ARTIFACT_DIR, 'person-1.0.0-conv.xqy'), es:instance-converter-generate($desc) )
$
ARTIFACT_DIR to a directory on your MarkLogic host where the generated code can be saved. Include the trailing directory separator in the pathname. /person-1.0.0-conv.xqy
is created. Though the generated code is runnable as-is, you will need to customize the code to match the characteristics of your source data and the requirements of your application. The generated code contains extensive comments to assist you with customization.
We could insert the converter module directly into the modules database to which it will eventually be deployed. However, the converter is an important project artifact, so you would normally save it to a file and place it under source control before proceeding with customizations.
The generated module defines the following externally visible functions, plus some private helper functions. The namespace prefix defined for the module is derived from the model title.
person:extract-instance-Person
- Create a Person
instance from raw source data. The returned instance is a json:object (map:map
). You are expected to customize this function to harmonize your source data with your model.person:instance-to-envelope
- Convert an entity instance into an XML or JSON envelope document that encapsulates the instance and the original source. Most applications will use this function as-is, but you might customize it if you include additional data in the envelope.person:instance-to-canonical
- Convert the map:map representation of an instance into its canonical XML or JSON representation. You will not usually need to customize this function or call it directly; it exists for use by the generated instance-to-envelope
function.For more details, see Creating an Instance Converter Module.
The converter module generated by Entity Services implements a modeltitle:extract-instance-
T function for each entity type T defined in the descriptor. In our example, the converter module implements a person:extract-instance-Person
function.
The default implementation of an instance converter assumes the source data has the same shape as a Person
entity. However, our source data has pid
, given
, and family
properties instead of id
, firstName
, lastName
, and fullName
. You must modify person:extract-instance-Person
to do the following:
id
from pid
firstName
from given
lastName
from family
fullName
by concatenating given
and family
Production applications can require many other types of customizations. For example, you might need to normalize a date value, perform a more sophisticated type conversion, or extract the value of an entity property from somewhere other than the source data.
Use the following procedure to customize the instance extraction code as described. A discussion of the code follows the procedure.
/person-1.0.0-conv.xqy
. If not, set the permissions accordingly. The file must also be readable by MarkLogic./person-1.0.0-conv.xqy
in the text editor of your choice.person:extract-instance-Person
that prepares the value of the id
, firstName
, lastName
, and fullName
properties. The code should look similar to the following:let $id := $source-node/id ! xs:string(.) let $firstName := $source-node/firstName ! xs:string(.) let $lastName := $source-node/lastName ! xs:string(.) let $fullName := $source-node/fullName ! xs:string(.)
let $id := $source-node/pid ! xs:string(.) let $firstName := $source-node/given ! xs:string(.) let $lastName := $source-node/family ! xs:string(.) let $fullName := fn:concat($firstName, " ", $lastName) ! xs:string(.)
Recall that the Person
entity type has id
, firstName
, lastName
, fullName
, and friends
properties. The default implementation of person:extract-instance-Person
assumes the source data contains the same properties. For example, the default implementation includes the following code:
let $id := $source-node/id ! xs:string(.) let $firstName := $source-node/firstName ! xs:string(.) let $lastName := $source-node/lastName ! xs:string(.) let $fullName := $source-node/fullName ! xs:string(.)
Our customization changes the names of the source fields to match our source data, and derives the fullName
property from the given
and family
source values. The modified portions are shown in bold, below.
let $id := $source-node/pid ! xs:string(.) let $firstName := $source-node/given ! xs:string(.) let $lastName := $source-node/family ! xs:string(.) let $fullName := fn:concat($firstName, " ", $lastName) ! xs:string(.)
Like any application code, the converter module must be deployed to MarkLogic before you can use it. Best practice is to install it in the modules database of your App Server. Our example uses the pre-defined App Server on port 8000, which is configured to use the Modules database.
The following procedure uses XQuery to install the customized converter module into the Modules database. You could also use Server-Side JavaScript or the REST, Java, or Node.js Client APIs for this task.
xquery version "1.0-ml"; let $ARTIFACT_DIR := '/space/es/gs/' (: MODIFY THIS VALUE :) return xdmp:document-load( fn:concat($ARTIFACT_DIR, 'person-1.0.0-conv.xqy'), <options xmlns="xdmp:document-load"> <uri>/es-gs/person-1.0.0-conv.xqy</uri> </options> )
An envelope document is the recommended way to persist and interact with entity instances in MarkLogic. An envelope document encapsulates an entity instance with model metadata and the original source. Storing the logical aspects of an entity (canonical instance representation, metadata, source) in one physical document facilitates managing, searching, retrieving, indexing, and securing your data.
An envelope document enables your application to query data as harmonized instances, but still recover the raw source when needed. You can generate either XML or JSON envelope documents.
You can use the person:instance-to-envelope
function in the converter module to create entity envelope documents. The input is an instance created by calling person:extract-instance-Person
. If you do not explicitly specify an envelope format of xml or json, the function generates an XML envelope.
Use the following procedure to create XML envelope documents from the source documents loaded in Stage the Source Data. Discussion of the code follows the procedure.
Person
entity envelope XML document from each source document.(: Create envelope documents from raw source documents :) xquery version "1.0-ml"; import module namespace es = "http://marklogic.com/entity-services" at "/MarkLogic/entity-services/entity-services.xqy"; import module namespace person = "http://example.org/example-person/Person-1.0.0" at "/es-gs/person-1.0.0-conv.xqy"; for $source in fn:collection('raw') return let $instance := person:extract-instance-Person($source) let $uri := fn:concat('/es-gs/env/', map:get($instance, 'id'), '.xml') return xdmp:document-insert( $uri, person:instance-to-envelope($instance, "xml"), <options xmlns="xdmp:document-insert"> <collections> <collection>person-envelopes</collection> </collections> </options> )
An envelope document can be either XML or JSON. This exercise uses XML envelopes. An XML envelope has the following form. The es:attachments
portion of the envelope holds the raw source data.
<es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info>metadata from info section of descriptor</es:info> ...instance canonical XML.. </es:instance> <es:attachments> source data </es:attachments> </es:envelope>
The equivalent JSON envelope, generated by passing "json" as the second parameter of person:instance-to-envelope
, has the following form:
{ "envelope": { "instance": { "info": { ...metadata from info section of descriptor... }, ...instance canonical JSON... }, "attachments": [ ...source data... ] }}
Except when constructing path expressions, you do not usually have to be aware of the internal structure of an envelope document because the Entity Services API includes functions for extracting an instance or the attachments from an envelope document handle it for you. For details, see Extracting an Entity Instance from an Envelope Document and Extracting the Original Source from an Envelope Document.
You create an envelope document for some entity type T and envelope format F using the extract-instance-
T and instance-to-envelope
functions of the instance converter. For example:
(: creating an XML envelope :) modeltitle:instance-to-envelope( modeltitle:extract-instance-T($source), "xml") (: creating a JSON envelope :) modeltitle:instance-to-envelope( modeltitle:extract-instance-T($source), "json")
For example, the sample code does the following to create a Person
entity XML envelope:
let $instance := person:extract-instance-Person($source) ... return xdmp:document-insert( $uri, person:instance-to-envelope($instance, "xml"), ...)
Inside person:instance-to-envelope
, the person:instance-to-canonical
function is called to create the Person entity embedded inside es:envelope/es:instance
.
The table below illustrates the progression from raw data to XML envelope document, through use of the instance converter module functions.
The following is an equivalent JSON envelope, generated by calling instance-to-envelope($instance, "json")
:
{ "envelope": { "instance": { "info": { "title":"Person", "version":"1.0.0" }, "Person": { "id":"2345", "firstName":"Martha", "lastName":"Washington", "fullName":"Martha Washington"} }, "attachments":[ "<person><pid>2345</pid><given>Martha</given><family>Washington</family></person>" ] }}
Note that the source data in the attachments is represented as a string if it does not match the envelope data format. For example, in the above JSON envelope, the source attachment is a string, rather than an XML node. This has implications for extracting the source from the envelope as a node; see the example in Query the Data.
This section illustrates one way to search your entity instance data using the cts:search XQuery function. You can also use other MarkLogic document search APIs, search your instances as row data, or use semantic search. The Entity Services API includes tools to facilitate all these forms of search. For details, see Querying a Model or Entity Instances.
The following example uses the XQuery cts:query
API to find all Person entities with a lastName
property of Washington, and then emits the original source from which the entity was derived.
person-envelopes
collection where the lastName
element has the value washington, and then returns the original source data from the envelope.xquery version "1.0-ml"; import module namespace es = "http://marklogic.com/entity-services" at "/MarkLogic/entity-services/entity-services.xqy"; (: match all envelopes containing an entity instances with : a lastName property value of 'washington' :) let $matches := cts:search( fn:collection('person-envelopes'), cts:element-query( fn:QName('http://marklogic.com/entity-services', 'instance'), cts:element-value-query(xs:QName('lastName'), 'washington') )) (: extract the original source, as a node :) for $attachment in $matches/es:envelope/es:attachments/node() return typeswitch ($attachment) case element() return $attachment case text() return xdmp:from-json-string($attachment) default return ()
{ "pid":2345, "given":"Martha", "family":"Washington" } <person xmlns:es="http://marklogic.com/entity-services"> <pid>1234</pid> <given>George</given> <family>Washington</family> </person>
The search matches two entity instances, one extracted from JSON source and one extracted from XML source, so final query results are one JSON node and one XML node.
The search is limited to the envelope documents by specifying the person-envelopes
collection. A container query (cts:element-query) further constrains the search to occurrences within the es:instance portion
of an envelope document. Finally, a cts:element-value-query is used to match envelopes where the lastName
property value is washington.
cts:search( fn:collection('person-envelopes'), cts:element-query( fn:QName('http://marklogic.com/entity-services', 'instance'), cts:element-value-query(xs:QName('lastName'), 'washington') ))
The container query ensures the search will not find matches in any part of the envelope document except the entity instance. You could similarly search just the es:attachments
, but remember that you cannot perform a structured search on JSON source in the attachments because it is stored in the envelope document as a string.
Notice that the example code can return the original XML source data directly out of the envelope document, but the original JSON document must be converted from a string to a JSON node using xdmp:from-json-string, if you want to return it as a node.
When you created a model in Create a Model, MarkLogic automatically generated some facts from the persisted descriptor, as semantic triples. These facts (and any additional facts you add) define the model and enable semantic queries against the model.
For example, you can use a SPARQL query to discover what entity types are defined by a model, what properties are required in an entity instance of a particular type, or the datatype of a particular entity type property. For more details, see Querying a Model or Entity Instances.
The following procedure uses a SPARQL query to generate a list of all the required properties of an instance of the Person
entity type:
Person
entity instance.prefix es:<http://marklogic.com/entity-services#> select ?ptitle where { ?x a es:EntityType; es:title "Person"; es:property ?property . ?property a es:RequiredProperty; es:title ?ptitle }
You should see results similar to the following:
ptitle "lastName" "fullName" "firstName"
You can also use the SQL and Optic APIs to query your model and entities as rows if you install an Entity Services generated TDE template based on your model. For more details and examples, see Querying a Model or Entity Instances. To learn more about Semantics in MarkLogic Server, see the Semantics Developer's Guide.
This section uses Server-Side JavaScript and JSON to introduce the Entity Services APIs. If you prefer to use XQuery, see Getting Started Using XQuery. You can also use JSON with XQuery and XML with JavaScript, but these combinations are not illustrated here.
This exercise ingests the raw source data from which we will create entity instances. One benefit of Entity Services is that you do not have to model your data up front. You can load your data as-is and use it in your application, and then incrementally model your entities.
You usually create entity instances from XML or JSON data. The raw data in this example is 2 XML documents and a JSON document. Each document contains information about a person, such as first name and last name. Each person document also includes a unique persond identifier.
Use the following procedure to load the raw source documents into your content database. The newly created documents are put into a collection named raw so we can easily reference them later.
http://localhost:8000/qconsole
'use strict'; declareUpdate(); // Synthesize source data in memory. This would normally come // from an external source. const sourceData = [ fn.head(xdmp.unquote( '<person>' + '<pid>1234</pid>' + '<given>George</given>' + '<family>Washington</family>' + '</person>')), {pid: 2345, given: 'Martha', family: 'Washington'}, fn.head(xdmp.unquote( '<person>' + '<pid>3456</pid>' + '<given>Alexander</given>' + '<family>Hamilton</family>' + '</person>')) ]; // Insert each source item into the db as an XML or JSON doc. sourceData.forEach(function(source) { let uri = '/es-gs/raw/'; if (source instanceof Document) { // XML doc created by xdmp.unquote uri += source.xpath('/node()/pid/data()') + '.xml'; } else if (source instanceof Object) { uri += source.pid + '.json'; } xdmp.documentInsert(uri, source, {collections: ['raw']}); });
/es-gs/raw/1234.xml /es-gs/raw/2345.json /es-gs/raw/3456.xml
The sourceData
array, above, creates raw data in a very artifical way in order to have a self-contained example. Your source data will normally come from an external source, such as files on the file system, an HTTP request payload, or an mlcp job.
Part of this artificiality is the use of xdmp.unquote as quick way to create an XML node from a literal. You would normally use NodeBuilder to create in-memory XML documents from Server-Side JavaScript.
You define the entity types, entity type properties, and relationships of your model in an XML or JSON model descriptor. The model descriptor is the staring point for creating a model. Model descriptors are discussed in detail in Creating and Managing Models.
The model descriptor in this example is based on the Person
example from the Entity Services examples on GitHub. For more details about the original example, see Exploring the Entity Services Open-Source Examples.
This exercise saves a JSON model descriptor as a file on the filesystem. Discussion of the descriptor follows the procedure.
person-desc.json
in ARTIFACT_DIR with the following contents.{ "info": { "title": "Person", "version": "1.0.0", "baseUri": "http://example.org/example-person/", "description": "A model of a person, to demonstrate several extractions" }, "definitions": { "Person": { "properties": { "id": {"datatype": "string"}, "firstName": {"datatype": "string"}, "lastName": {"datatype": "string"}, "fullName": {"datatype": "string"}, "friends": { "datatype": "array", "items": {"$ref": "#/definitions/Person" } }}, "primaryKey": "id", "required": ["firstName", "lastName", "fullName"] } } }
You now have a file named ARTIFACT_DIR/person-desc.json
that contains the Person
model descriptor. For an example of the equivalent XML descriptor, see Create a Model Descriptor in the XQuery walkthrough.
We stored the model on the filesystem because this most closely resembles a real development cycle, in which an important project artificat like the model descriptor is under source control.
The descriptor defines a single entity type named Person
. A Person
entity instance contains string-valued properties named id
, firstName
, lastName
, fullName
and list-valued property named friends
.
"Person": { "properties": { "id": {"datatype": "string"}, "firstName": {"datatype": "string"}, "lastName": {"datatype": "string"}, "fullName": {"datatype": "string"}, "friends": { "datatype": "array", "items": {"$ref": "#/definitions/Person" } }}, ...
The friends
property is a list (array) of references to other Person
entities. Since the reference to Person
appears in the same descriptor in which Person
is defined, it is a local reference. Entity Services knows the shape of the referenced entity type when generating code from a Person
model. You can also reference entity types defined elsewhere.
The firstName
, lastName
, and fullName
properties must be present in every Person entity instance because these properties are explicitly flagged as required through the required
descriptor property:
"required": ["firstName", "lastName", "fullName"]
The id
property is implicitly required because it is identified as the primary key for a Person
:
"primaryKey":"id"
The primary key is a unique identifier for an entity instance. You are not required to define a primary key, but the existence of a primary key facilitates other Entity Services features; for details, see Identifying the Primary Key Entity Property.
Since the friends
property is neither a primary key nor an explicitly required property, it is optional. That is, you can create Person
entities that do not include a friends
property.
You can also flag properties with other characteristics, such as whether or not a property should be indexed for efficient search. For more details, see Writing a Model Descriptor.
Inserting an XML or JSON model descriptor document into the special collection http://marklogic.com/entity-services/models
tells MarkLogic the document is part of an Entity Services model. Membership in this collection causes MarkLogic to generate semantic triples that define the model.
We authored a model descriptor in Create a Model Descriptor. The following procedure covers the validation and persistence steps that create the model. An explanation of the code follows the procedure.
The following procedure creates a model using the Person
model descriptor. An explanation of the code follows the procedure.
'use strict'; declareUpdate(); const es = require('/MarkLogic/entity-services/entity-services.xqy'); // Retrieve descriptor from filesystem const ARTIFACT_DIR = '/space/es/gs/'; // CHANGE THIS VALUE const desc = fn.head( xdmp.documentGet(ARTIFACT_DIR + 'person-desc.json')); // Create the model xdmp.documentInsert( '/es-gs/models/person-1.0.0.json', es.modelValidate(desc), {collections: ['http://marklogic.com/entity-services/models']} );
/es-gs/models/person-1.0.0.json
.If the query is unable to open the model descriptor file, check the permissions on the directory and file.
The model is created by persisting the descriptor as part of the collection http://marklogic.com/entity-services/models
.
xdmp.documentInsert( '/es-gs/models/person-1.0.0.json', es.modelValidate(desc), {collections: ['http://marklogic.com/entity-services/models']} );
The example also uses es.modelValidate to check the descriptor for errors before inserting it. An invalid descriptor will generate an invalid model. If the descriptor is invalid, es.modelValidate throws an exception. If you know your model descriptor is valid, you can skip validation. Skipping validation is faster, but validation is recommended during development.
An instance converter is an XQuery library module containing code for transforming your raw source data into entity instances that conforms to your model. You can use the Entity Services API to generate a baseline converter, and then customize it to meet the requirements of your application.
This section walks through deploying a converter module in the following steps:
This exercise creates an instance converter module template using the es.instanceConverterGenerate function. An explanation of the code follows the procedure.
'use strict'; const es = require('/MarkLogic/entity-services/entity-services.xqy'); const ARTIFACT_DIR = '/space/es/gs/'; // CHANGE THIS VALUE const desc = cts.doc('/es-gs/models/person-1.0.0.json'); xdmp.save( ARTIFACT_DIR + 'person-1.0.0-conv.xqy', es.instanceConverterGenerate(desc) );
/person-1.0.0-conv.xqy
is created. We could have inserted the converter module directly into the modules database to which it will eventually be deployed. However, the converter is an important project artifact, so you would normally save it to a file and place it under source control. Also, most applications will require converter customizations.
The generated code is runnable as-is, but you are expected to customize the code to match the characteristics of your source data and the requirements of your application. The generated code contains comments to assist you with customization. You will need to understand some XQuery to customize the converter for a production application.
The generated module defines the following functions. The namespace prefix defined for the module is derived from the model title.
person:extract-instance-Person
- Create a Person
instance from raw source data. You are expected to customize this function to harmonize your source data with your model.person:instance-to-envelope
- Convert an entity instance into an XML or JSON envelope document that encapsulates the instance and the original source. Most applications will use this function as-is, but you might customize if you include additional data in the envelope.person:instance-to-canonical
- Convert the JSON
object representation of an instance into its canonical XML or JSON representation. You will not usually need to customize this function or call it directly; it exists for use by the generated instance-to-envelope
function.As with any XQuery module in MarkLogic, you can use the instance converter module from Server-Side JavaScript, once you install the module. Bring the module into scope using a require
statement. For example, if the module is installed in the modules database with the URI /es-gs/person-1.0.0-conv.xqy, then use a require
statement such as the following:
const person = require('/es-gs/person-1.0.0-conv.xqy');
Invoke the functions using their JavaScript-style, camel-case names. For example, in the case of the Person entity type, the module converter functions can be invoked from Server-Side JavaScript using the following names, assuming the module is represented by a variable named person
, as shown in the above require
statement.
person.extractInstancePerson person.instanceToEnvelope person.instanceToCanonical
For more details, see Creating an Instance Converter Module.
The converter module generated by Entity Services implements a modeltitle:extract-instance-
T function for each entity type T defined in the descriptor. In our example, the converter module implements a person:extract-instance-Person
function.
The default implementation of an instance converter assumes the source data has the same shape as a Person
entity. However, our source data has pid
, given
, and family
properties instead of id
, firstName
, lastName
, and fullName
. You must modify person:extract-instance-Person
to do the following:
id
from pid
firstName
from given
lastName
from family
fullName
by concatenating family
and given
Production applications can require many other types of customizations. For example, you might need to normalize a date value, perform a more sophisticated type conversion, or extract the value of an entity property from somewhere other than the source data.
Use the following procedure to customize the instance extraction code. A discussion of the code follows the procedure.
/person-1.0.0-conv.xqy
. If not, set the permissions accordingly. The file must also be readable by MarkLogic./person-1.0.0-conv.xqy
in the text editor of your choice.person:extract-instance-Person
that sets the value of the id
, firstName
, lastName
, and fullName
properties. The code should look similar to the following:let $id := $source-node/id ! xs:string(.) let $firstName := $source-node/firstName ! xs:string(.) let $lastName := $source-node/lastName ! xs:string(.) let $fullName := $source-node/fullName ! xs:string(.)
let $id := $source-node/pid ! xs:string(.) let $firstName := $source-node/given ! xs:string(.) let $lastName := $source-node/family ! xs:string(.) let $fullName := fn:concat($firstName, " ", $lastName) ! xs:string(.)
Recall that the Person
entity type has id
, firstName
, lastName
, fullName
, and friends
properties. The default implementation of person:extract-instance-Person
assumes the source data contains the same properties. For example, the default implementation includes the following code:
let $id := $source-node/id ! xs:string(.) let $firstName := $source-node/firstName ! xs:string(.) let $lastName := $source-node/lastName ! xs:string(.) let $fullName := $source-node/fullName ! xs:string(.)
Each of the variable declarations assumes the value of a property in the new entity instance ($instance
) is the value of a property with the same name in the source node. Since that assumption does not match the example model, customization is required.
Our customization changes the names of the source fields to match our source data, and derives the fullName
property value from the given
and family
source values. The modified portions are shown in bold, below.
let $id := $source-node/pid ! xs:string(.) let $firstName := $source-node/given ! xs:string(.) let $lastName := $source-node/family ! xs:string(.) let $fullName := fn:concat($firstName, " ", $lastName) ! xs:string(.)
Like any application code, the converter module must be deployed to MarkLogic before you can use it. Best practice is to install it in the modules database of your App Server. Our example uses the pre-defined App Server on port 8000, which is configured to use the Modules database.
The following procedure uses XQuery to install the customized converter module into the Modules database. You could also use Server-Side JavaScript or the REST, Java, or Node.js Client APIs for this task.
// ** RUN AGAINST MODULES DB ** 'use strict'; declareUpdate(); const ARTIFACT_DIR = '/space/es/gs/'; // CHANGE THIS VALUE xdmp.documentLoad( ARTIFACT_DIR + 'person-1.0.0-conv.xqy', { uri: '/es-gs/person-1.0.0-conv.xqy' } );
ARTIFACT_DIR
to the directory where you previously saved the converter module. Include the trailing directory separator in the pathname.An envelope document is the recommended way to persist and interact with entity instances in MarkLogic. An envelope document encapsulates an entity instance with model metadata and the original source. Storing the logical aspects of an entity (canonical instance representation, metadata, source) in one physical document facilitates managing, searching, retrieving, indexing, and securing your data.
An envelope document enables your application to query data as harmonized instances, but still recover the raw source when needed. You can generate either XML or JSON envelope documents.
You can use the person.instanceToEnvelope
function in the converter module to create entity envelope documents. The input is an instance created by calling person.extractInstancePerson
. If you do not explicitly specify an envelope format of xml or json, the function generates an XML envelope.
Use the following procedure to create envelope documents from the source documents loaded in Stage the Source Data. Discussion of the code follows the procedure.
Person
entity envelope document from each source document.'use strict'; declareUpdate(); const es = require('/MarkLogic/entity-services/entity-services.xqy'); const person = require('/es-gs/person-1.0.0-conv.xqy'); for (const source of fn.collection('raw')) { let instance = person.extractInstancePerson(source); let uri = '/es-gs/env/' + instance.id + '.xml'; xdmp.documentInsert( uri, person.instanceToEnvelope(instance, "xml"), {collections: ['person-envelopes']} ); }
An envelope document can be either XML or JSON. This exercise uses XML envelopes. An XML envelope has the following form. The es:attachments
portion of the envelope holds the raw source data.
<es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info>metadata from info section of descriptor</es:info> ...instance canonical XML.. </es:instance> <es:attachments> source data </es:attachments> </es:envelope>
The equivalent JSON envelope, generated by passing "json" as the second parameter of person.instanceToEnvelope
, has the following form:
{ "envelope": { "instance": { "info": { ...metadata from info section of descriptor... }, ...instance canonical JSON... }, "attachments": [ ...source data... ] }}
Except when constructing path expressions, you do not usually have to be aware of the internal structure of an envelope document because the Entity Services API includes functions for extracting an instance or the attachments from an envelope document handle it for you. For details, see Managing Entity Instances.
You create an envelope document for some entity type T using the extractInstance
T and instanceToEnvelope
functions of the instance converter. (These are the extract-instance-
T and instance-to-envelope
functions in the XQuery module.) For example:
modeltitle.instanceToEnvelope( modeltitle.extractInstanceT($source))
For example, the sample code does the following to create a Person entity envelope:
let instance = person.extractInstancePerson(source); ... xdmp.documentInsert( uri, person.instanceToEnvelope(instance, "xml"), ...)
Inside person.instanceToEnvelope
, the person.instanceToCanonical
function is called to create the Person
entity embedded inside es:envelope/es:instance
.
The table below illustrates the progression from raw data to XML envelope document, through use of the instance converter module functions.
The following is an equivalent JSON envelope, generated by calling instanceToEnvelope(instance, "json")
:
{ "envelope": { "instance": { "info": { "title":"Person", "version":"1.0.0" }, "Person": { "id":"2345", "firstName":"Martha", "lastName":"Washington", "fullName":"Martha Washington"} }, "attachments":[ "<person><pid>2345</pid><given>Martha</given><family>Washington</family></person>" ] }}
Note that the source data in the attachments is as a string if it does not match the envelope data format. For example, in the above JSON envelope, the source attachment is a string, rather than an XML node. This has implications for extracting the source from the envelope as a node; see the example in Query the Data.
This section illustrates one way to search your entity instance data, using the JSearch API. You can also use other MarkLogic document search APIs, search your instances as row data, or use semantic search. The Entity Services API includes tools to facilitate all these forms of search. For details, see Querying a Model or Entity Instances.
The following example uses the JSearch API to find all Person
entities with a lastName
property of Washington.
person-envelopes
collection where the es:instance
element includes a lastName
element with the value washington, and then returns the original source data from the envelope. 'use strict'; import jsearch from '/MarkLogic/jsearch.mjs'; // Find all occurences of lastName with the value 'washington' contained // in an es:instance element. Return just the documents in the results. const people = jsearch.collections('person-envelopes'); const matches = people.documents() .where(cts.elementQuery( fn.QName('http://marklogic.com/entity-services', 'instance'), cts.elementValueQuery('lastName', 'washington'))) .map(match => match.document) .result(); // Extract the raw source data from the search results, // as XML or JSON nodes const asNodes = []; for (let match of matches.results) { let attachment = fn.head(match.xpath('//*:attachments/node()')); if (attachment instanceof Element) { // already an XML node asNodes.push(attachment); } else { // serialized JSON; deserialize to a JSON document node asNodes.push(fn.head(xdmp.unquote(attachment))); } } // Dump the results in Query Console. The conversion from array // to Sequence is just used to finesse the way QC renders array // items that are XML nodes. It is not functionally significant. Sequence.from(asNodes);
{ "pid":2345, "given":"Martha", "family":"Washington" } <person xmlns:es="http://marklogic.com/entity-services"> <pid>1234</pid> <given>George</given> <family>Washington</family> </person>
The search matches two envelope documents, one extracted from JSON source and one extracted from XML source.
The search is first constrained to documents in the person-envelopes
collection. Then a container query (cts.elementQuery) further constrains matches to those contained in an es:instance
element. Finally, a value query (cts.elementValueQuery) is used to find elements named lastName
with the value 'washington'.
const people = jsearch.collections('person-envelopes'); const matches = people.documents() .where(cts.elementQuery( fn.QName('http://marklogic.com/entity-services', 'instance'), cts.elementValueQuery('lastName', 'washington'))) ...
The container query ensures the search will not find matches in any part of the envelope data except the instance. You could similarly search just the attachments, though you cannot effectively perform a structured search on raw JSON data this way because JSON source is stored in the XML envelope document as a serialized string.
The map feature of JSearch is used to just return the matched documents, eliminating the search metadata such as the URI, relevance score, and confidence. The mapper was used just to streamline the output; a mapper is not required by Entity Services or the JSearch API.
people.documents() .where(...) .map(match => match.document)
The search produces the following output, which we saved to the matches
variable for subsequent processing.
{"results":[ <es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info> <es:title>Person</es:title> <es:version>1.0.0</es:version> </es:info> <Person> <id>2345</id> <firstName>Martha</firstName> <lastName>Washington</lastName> <fullName>Martha Washington</fullName> </Person> </es:instance> <es:attachments>{"pid":2345, "given":"Martha", "family":"Washington"}</es:attachments> </es:envelope> <es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info> <es:title>Person</es:title> <es:version>1.0.0</es:version> </es:info> <Person> <id>1234</id> <firstName>George</firstName> <lastName>Washington</lastName> <fullName>George Washington</fullName> </Person> </es:instance> <es:attachments> <person> <pid>1234</pid> <given>George</given> <family>Washington</family> </person> </es:attachments> </es:envelope> ], "estimate":2 }
Note that the example code can return the original XML source data directly out of the envelope document because the attachments contain an XML element node. However, the original JSON source data must be converted from a string to a JSON node using xdmp:from-json-string, if you want to work with it as structured data. This conversion is the purpose of the following section of code:
if (attachment instanceof Element) { // already an XML node asNodes.push(attachment); } else { // serialized JSON; deserialize to a JSON document node asNodes.push(fn.head(xdmp.fromJsonString(attachment))); }
(The accumulation of the attachments into the asNodes
array and subsequent conversion of asNodes
into a Sequence
is just done to finesse the way Query Console displays results.)
For more details and examples, see Querying a Model or Entity Instances.
When you created a model in Create a Model, MarkLogic automatically generated semantic triples from the descriptor. These triples define the model. You can add more facts about the model in the form of additional triples. You can use SPARQL or the Optic API to query a model.
For example, you can use a SPARQL query to discover what entity types are defined by a model, what properties are required in an entity instance of a particular type, or the datatype of a particular entity type property. For more details, see Querying a Model or Entity Instances.
The following procedure uses a SPARQL query to generate a list of all the required properties of an instance of the Person
entity type:
Person
entity instance.prefix es:<http://marklogic.com/entity-services#> select ?ptitle where { ?x a es:EntityType; es:title "Person"; es:property ?property . ?property a es:RequiredProperty; es:title ?ptitle }
You should see results similar to the following:
ptitle "lastName" "fullName" "firstName"
You can also use the SQL and Optic APIs to query your model and entities as rows if you install an Entity Services generated TDE template based on your model. For more details and examples, see Querying a Model or Entity Instances. To learn more about Semantics in MarkLogic Server, see the Semantics Developer's Guide.
The following topics can help deepen your understanding of the Entity Services API.
Model descriptors support several features not covered here, such as identifying a primary key and flagging properties for indexing to facilitate fast searches.
For example, you can use Entity Services to generate Search and Client API query options and database configuration artifacts based on your model. You can also generate a Template Driven Extraction (TDE) template that enables row and semantic search of instances. For details, see Generating a TDE Template.