This chapter describes how to create, retrieve, update, and delete entity instances derived from a model created with MarkLogic Entity Services. The chapter covers the following topics:
This section introduces entity instance concepts helpful in creating, persisting, querying, and extracting entity instance data. The following topics are included:
An entity instance is a concrete instantiation of an entity type defined in a model.
For example, suppose you have a JSON model descriptor that defines a Person
entity type with the following properties. This is based on the model in Getting Started With Entity Services.
"Person": { "properties": { "id": {"datatype": "string"}, "firstName": {"datatype": "string"}, "lastName": {"datatype": "string"}, "fullName": {"datatype": "string"}, "friends": { "datatype": "array", "items": {"$ref": "#/definitions/Person" } }}, ... }
Then the canonical representation of a Person
instance would have the following form, depending on whether you choose to work with XML or JSON.
By convention, an instance is stored as child XML elements or JSON properties of an envelope document. You can extract an instance from an envelope as XML or JSON, regardless of the envelope format. For details, see What is an Envelope Document? and Extracting an Entity Instance from an Envelope Document.
An instance can have multiple representations, depending on the context:
es:instance
XML element with a child element that is the canonical XML representation of the instance. A JSON envelope document contains an "instance"
property that contains the canonical JSON representation of the instance. The canonical representation is the one on which queries are based. For details, see What is an Envelope Document?.For more details, see Example: Entity Instance Representations.
If you follow the Entity Services conventions, your entity instances are persisted in MarkLogic as part of an envelope document. An envelope document encapsulates instance data with related metadata that might be useful to your application. You can use either XML or JSON envelopes.
An envelope document for some entity type T is created using the instance-to-envelope
function in T's instance converter module. For more details, see Creating an Entity Instance from a Data Source and Creating an Instance Converter Module.
An envelope document has the following form by default.
The instance
section contains the canonical representation of the instance, plus metadata such as the model title and version from which entity type is derived. The attachments
section contains the source data, by convention; you can add additional attachments.
The envelope format does not have to match the format of your raw source data. You can generate JSON envelopes for instances based on XML source and vice versa. However, if the source and envelope formats differ, the raw source is stored in the attachments
section of the envelope as a string.
You can customize an envelope document to include other information, but you should generally not modify the instance
portion. The instance data should accurately reflect the entity type definition in your model. If you need to normalize or derive property values, do so in the extract-instance-T
function of your instance converter.
If you customize the envelope by adding data to the attachments
element, then you can use the es:instance-get-attachments XQuery function or the es.instanceGetAttachments JavaScript function to retrieve the data. If you put it elsewhere in the envelope, then you are solely responsible for retrieving it from the envelope.
The Entity Services API includes functions for retrieving the instance data and attachments from an envelope. For details, see Extracting an Entity Instance from an Envelope Document and Extracting the Original Source from an Envelope Document.
This example illustrates the various instance representations discussed in What is an Instance?.
This example uses the Person
entity type from the model defined in Getting Started With Entity Services.
Representation | Example | |
---|---|---|
1 | Raw Source | <person> <pid>1234</pid> <given>George</given> <family>Washington</family> </person> |
2 | In-memory instance, as returned by Shown here as JSON for readability, but really a json:object (map:map) with keys |
{"$attachments": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<person>\n <pid>1234</pid>\n <given>George</given>\n <family>Washington</last>\n</family>", "$type": "Person", "id": "1234", "firstName": "George", "lastName": "Washington", "fullName": "George Washington" } |
3 | <Person> <id>1234</id> <firstName>George</firstName> <lastName>Washington</lastName> <fullName>George Washington</fullName> </Person> |
|
4 | Envelope document, as generated by instance-to-envelope |
<es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info> <es:title>Person</es:title> <es:version>1.0.0</es:version> </es:info> <Person> <id>1234</id> <firstName>George</firstName> <lastName>Washington</lastName> <fullName>George Washington</fullName> </Person> </es:instance> <es:attachments> <person> <pid>1234</pid> <first>George</first> <last>Washington</last> </person> </es:attachments> </es:envelope> |
5 | json:object (map:map) representation extracted from envelope document by es:instance-from-document or es.instanceFromDocument Shown here as JSON for readability, this is really a map:map in XQuery. In JavaScript, this function returns a JavaScript object. The value is mutable. |
{ "id": "1234", "firstName": "George", "lastName": "Washington", "fullName": "George Washington", "$type": "Person" } |
6 | XML representation extracted from envelope document by |
<Person> <id>1234</id> <firstName>George</firstName> <lastName>Washington</lastName> <fullName>George Washington</fullName> </Person> |
7 | JSON representation extracted from envelope document by This function returns a JSON object node. The value is immutable. |
{ "Person": { "id": "1234", "firstName": "George", "lastName": "Washington", "fullName": "George Washington" } } |
The representations you see on lines 2, 3, and 4 were created by an instance converter module. For details, see Creating an Instance Converter Module. The representation on line 2 is a transient, mutable in-memory representation designed for ease of use in instance converter code. If you pass an envelope document to the convert-instance-
T function of a version translator module, it returns a similar representation; for details, see Creating a Model Version Translator Module.
The envelope document representation on line 4 is the recommended way to store entity instances in MarkLogic. You can customize the contents of your envelope, but should usually leave the es:instance
portion as-is. This is the layout produced by the instance-to-envelope
function of an instance converter.
The representations on lines 5, 6, and 7 are instances extracted from an envelope document using the Entity Services API. The map:map representation on line 5 differs from the other extracted entities in that it is mutable and carries explicit type information in the $type
property. This representation differs from the one on line 2 in that it contains only the instance entity type properties. There is no $attachments
. For more details, see Extracting an Entity Instance from an Envelope Document.
This example uses the Person
entity type from the model defined in Getting Started With Entity Services.
Representation | Example | |
---|---|---|
1 | Raw Source | { "pid": 2345, "given": "Martha", "family": "Washington" } |
2 | In-memory instance, as returned by Shown here as JSON for readability, but really a json:object (map:map) with keys |
{ "$type": "Person", "$attachments": { "pid": 2345, "given": "Martha", "family": "Washington" }, "id": 2345, "firstName": "Martha", "lastName": "Washington", "fullName": "Martha Washington" } |
3 | {"Person": { "id":"2345", "firstName": "Martha", "lastName": "Washington", "fullName": "Martha Washington" }} |
|
4 | JSON Envelope document, as generated by instance-to-envelope |
{"envelope": { "instance": { "info": { "title": "Person", "version": "1.0.0" }, "Person": { "id": "2345", "firstName": "Martha", "lastName": "Washington", "fullName": "Martha Washington" } }, "attachments": [ { "pid": 2345, "given": "Martha", "family": "Washington" } ] } } |
5 | json:object (map:map) representation extracted from envelope document by es:instance-from-document or es.instanceFromDocument Shown here as JSON for readability, this is really a map:map in XQuery. In JavaScript, this function returns a JavaScript object. The value is mutable. |
{ "$type": "Person", "id":"2345", "firstName":"Martha", "lastName":"Washington", "fullName":"Martha Washington" } |
6 | XML representation extracted from envelope document by |
<Person> <id>2345</id> <firstName>Martha</firstName> <lastName>Washington</lastName> <fullName>Martha Washington</fullName> </Person> |
7 | JSON representation extracted from envelope document by This function returns a JSON object node. The value is immutable. |
{ "Person": { "id":"2345", "firstName":"Martha", "lastName":"Washington", "fullName":"Martha Washington" }} |
The representations you see on lines 2, 3, and 4 were created by an instance converter module. For details, see Creating an Instance Converter Module. The representation on line 2 is a transient, mutable in-memory representation designed for ease of use in instance converter code. If you pass an envelope document to the convert-instance-
T function of a version translator module, it returns a similar representation; for details, see Creating a Model Version Translator Module.
The envelope document representation on line 4 is the recommended way to store entity instances in MarkLogic. You can customize the contents of your envelope, but should usually leave the instance
portion as-is. This is the layout produced by the instance-to-envelope
function of an instance converter.
The representations on lines 5, 6, and 7 are instances extracted from an envelope document using the Entity Services API. The map:map representation on line 5 differs from the other extracted entities in that it is mutable and carries explicit type information in the $type
property. This representation differs from the one on line 2 in that it contains only the instance entity type properties. There is no $attachments
property. For more details, see Extracting an Entity Instance from an Envelope Document.
The Entity Services API does not dictate how you create an entity instance from source data, but the recommended process is as follows:
extract-instance-
T and instance-to-envelope
functions of the instance converter module to create instance envelope documents for some entity type T from source data.By convention, instances are stored as child elements of an XML or JSON envelope document. You can extract an instance from an envelope document in several formats. For details, see Extracting an Entity Instance from an Envelope Document.
The following code illustrates one way to create envelope documents from raw source. In this example, the source data comes from documents in MarkLogic that are in a collection named raw, and instances are generated for an entity type named Person
. The generated envelope documents are in XML format; you could also choose JSON. This example uses the converter and data from Getting Started With Entity Services.
The resulting envelope documents have the following form by default. The instance data is accessible in an envelope document via the XPath expression //es:instance
(or //*:instance
). The original source from which the instance was derived is accessible via the XPath expression //es:attachments
(or //*:attachments
).
<es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info> <es:title>Person</es:title> <es:version>1.0.0</es:version> </es:info> <Person> <id>1234</id> <firstName>George</firstName> <lastName>Washington</lastName> <fullName>George Washington</fullName> </Person> </es:instance> <es:attachments> <person> <pid>1234</pid> <given>George</given> <family>Washington</family> </person> </es:attachments> </es:envelope>
If you generate JSON envelopes rather than XML envelopes, you get envelopes of the following form by default. The instance data is accessible in an envelope document via the XPath expression //instance
(or //*:instance
). The original source from which the instance was derived is accessible via the XPath expression //attachments
(or //*:attachments
).
{ "envelope": { "instance": { "info": { "title": "Person", "version": "1.0.0" }, "Person": { "id": "1234", "firstName": "George", "lastName": "Washington", "fullName": "George Washington" } }, "attachments": [ "<person><pid>1234<\/pid><given>George<\/given><family>Washington<\/family><\/person>" ] } }
If your model specifies a namespace binding for an entity type and you use JSON envelopes, the namespace is discarded in the JSON representation, but the code and configuration artifacts still assumes a namespace, so it will not work properly with JSON envelope documents. You should use XML envelope documents for entity types that define a namespace binding.
For an end-to-end example of creating envelope documents using this model, see Getting Started With Entity Services.
You can generate test instances from a model using the es:model-get-test-instances XQuery function or es.modelGetTestInstances Server-Side JavaScript function. You can use test instances for tasks such as experimenting with model refinement and testing code that manipulates instances.
The test instances are based purely on the model and do not reflect data normalization or customization you add to your instance converter. The test instances can help you identify properties for which converter customization is required.
The es:model-get-test-instances
and es.modelGetTestInstances
functions return a sequence of instances, one for each entity type defined in the input model.
If an entity type property definition contains a local reference, the referenced entity type is assumed to be embedded in the referencing entity. If an entity type property definition contains an external reference, no meaningful test value can be generated.
For example, assume the following model defining two entity types, Name
and Person
. A Person
contains a local reference to a Name
.
{ "info": { "title": "Example", "version": "1.0.0", "description": "ES Examples" }, "definitions": { "Name": { "properties": { "first": { "datatype": "string" }, "last": { "datatype": "string" } } }, "Person": { "properties": { "id": { "datatype": "int" }, "name": { "$ref": "#/definitions/Name" }, } } } }
If you generate test instances from this model, the name
property of the Person
test instance contains a Name
instance value:
<Person> <id>123</id> <name> <Name> <first>some string</first> <last>some string</last> </Name> </name> </Person>
If the name
property of a Person
entity was an external reference to such as http://example.com/SomeType instead, then no meaningful test value can be generated. The Person
test instance would look like the following:
<Person> <id>123</id> <name><SomeType>externally-referenced-instance</SomeType></name> </Person>
To generate instances from real source data, use an instance converter. For more details, see Creating an Instance Converter Module and Creating an Entity Instance from a Data Source.
Though Entity Services encourages storing your instances in MarkLogic in the form of envelope documents, downstream consumers of your data, such as client applications, will probably expect to receive the canonical instance data, not the entire envelope.
The Entity Services API includes the following XQuery functions for extracting an instance from an envelope document. The corresponding JavaScript functions follow.
XQuery Function | Extracted Instance Format |
---|---|
es:instance-from-document |
map:map (json:object, mutable) |
es:instance-json-from-document |
object-node() (immutable) |
es:instance-xml-from-document |
element() (immutable) |
The Entity Services API includes the following Server-Side JavaScript functions for extracting an instance from an envelope document.
For example, suppose you have the following envelope document in the database with the URI /es-gs/env/1234.xml
:
<es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance> <es:info> <es:title>Person</es:title> <es:version>1.0.0</es:version> </es:info> <Person> <id>1234</id> <firstName>George</firstName> <lastName>Washington</lastName> <fullName>George Washington</fullName> </Person> </es:instance> <es:attachments> <person> <pid>1234</pid> <given>George</given> <family>Washington</family> </person> </es:attachments> </es:envelope>
Then, the following code snippet extracts an instance from the envelope document as a json:object in XQuery or a JavaScript object in JavaScript.
The result is a sequence containing one item, equivalent to the following JSON:
{ "id":"1234", "firstName":"George", "lastName":"Washington", "fullName":"George Washington", "$type": "Person" }
The following table illustrates the result of calling each of the instance envelope extraction functions.
For more detailed coverage of instance representations, see What is an Instance? and Example: Entity Instance Representations.
If you follow the Entity Services conventions, an envelope document encapsulates both the canonical instance data and the raw source from which it was derived. This encapsulation happens when you call the instance-to-envelope
XQuery function in a model's generated instance converter module.
You can extract the attachments from an envelope document using the es:instance-get-attachments XQuery function or the es.instanceGetAttachments JavaScript function. You can use these function on a customized envelope, as long as the attachments are locatable via the XPath expression //es:attachments
.
The raw source data is saved in the envelope as an attachment. For example, the highlighted <person/>
element below is the raw XML source from which the enveloped instance was derived.
<es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance>...</es:instance> <es:attachments> <person> <pid>1234</pid> <given>George</given> <family>Washington</family> </person> </es:attachments> </es:envelope>
If the format of the source data does not match the format of the envelope, the source data is serialized and stored in the envelope as a string. For example, if the source data is JSON and the envelope value is XML, then the source is stored as the text value of an es:attachments
XML element. The following snippet is from an XML envelope document created from JSON source:
<es:envelope xmlns:es="http://marklogic.com/entity-services"> <es:instance>...</es:instance> <es:attachments>{"pid":2345, "given":"Martha", "family":"Washington"}</es:attachments> </es:envelope>
The following code extracts the raw source attachment from an envelope document, assuming it is the only attachment.
If there are multiple children in the //es:attachments element
, you are responsible for picking out the raw source from the other attachments. There will only be multiple attachments if you explicitly add extra attachments.
If the original source attachment and the envelope format do not match, you must convert the serialization if you want to work with the data in its original form. For example, the following code deserializes a serialized JSON attachment from an XML envelope document, and then accesses one of its properties.
The following code is a similar example that extracts an XML attachment from a JSON envelope:
As your model changes, you might need to update your instance data to match. Model changes can also impact generated and configuration artifacts. For details, see Managing Model Changes.
Let's say you have an object Role
, and it contains a number of Role
objects as children. Further, let's say you have an instance of role
called permanentWorker
, and that you have a requirement to match on an element in permanentWorker
named personnelNumber
. This means you need to be able to match on an element that is three levels deep in the Role
hierarchy. Here are some options at your disposal for making the desired match:
The first option is to make data conform to the Entity model. Assuming you have an Entity Model named RoleModel
, the nested item could then be matched against a property name RoleModel.personnelNumber
. More information about entity instances and their canonical representation that can be found in the section Generating Test Entity Instances. Take a look at the example of a Person
entity that has a child Name
entity that illustrates how nested entity instances should be composed.
Another approach is to create different models for each role to be assigned. The resulting instance would look something like this:
"permanentWorker":{ "permanentWorkerModel": { "personnelNumber":"12345", "positionNumber":"111111", "positionName":"Program Manager", "officeLocation":"1801 Main Street", "department":"Health Care" } }, "temporaryWorker":{ "temporaryWorkerModel":{ "supplier":"12345", "assignmentNumber":"111111", "officeLocation":"1801 Main Street", "department":"Health Care" } }, "vendorWorker":{ "vendorWorkerModel":{ "vendor":"12345", "officeLocation":"1801 Main Street", "department":"Health Care" } }
Yet another option is to remove the targetEntity
option from their Data Hub Matching Step Settings. This tells the Data Hub Framework to not enforce a valid Entity Model.