Entity Services Developer's Guide (PDF)

MarkLogic 9 Product Documentation
Entity Services Developer's Guide
— Chapter 2

« Previous chapter
Next chapter »

Getting Started With Entity Services

This chapter walks through a very simple Entity Services example of creating a model, creating entity instances from source data, and querying the model and instances. Choose either the XQuery walkthrough or the Server-Side JavaScript walkthrough.

Before You Begin

All the exercises in this section use the Query Console browser application to evaluate code on MarkLogic Server. You can launch Query Console by navigating to port 8000 of a host running MarkLogic.

For example, if MarkLogic is installed on localhost, launch Query Console by opening the following location in your browser:

http://localhost:8000 

To use Query Console, you must have the qconsole-user role or equivalent privileges. You can learn more about Query Console in the Query Console User Guide.

You do not require special security privileges to use the Entity Services API. However, some exercises in this chapter involve deploying application code to MarkLogic, so you should log into Query Console as a user with the admin role or equivalent privileges.

Some exercises in this chapter save generated code and configuration artifacts to the local filesystem on the host where MarkLogic is installed, and later read them back. You can choose any directory, but the directory must be readable and writeable by MarkLogic and by you. The examples use the variable ARTIFACT_DIR to represent this directory in the instructions.

Optional: Create a Content Database

You can use any database for the exercises in this chapter. However, if you would like to isolate this work from the rest of your environment, you can use the procedure in this section to create a new content database named es-gs, with one forest of the same name attached to it.

The following procedure uses the XQuery Admin API to create a database and a forest, and then attach the forest to the database. You could also use the Admin Interface or the REST Management API.

  1. Navigate to Query Console in your browser. For example, if MarkLogic is installed on localhost, navigate to the following URL:
    http://localhost:8000/qconsole
  2. When prompted for login credentials, login as a user with admin privileges.
  3. Add a new query to the workspace by clicking on the + button on the query editor.
  4. Select XQuery in the Query Type dropdown.
  5. Copy and paste the following code into the new query. This code creates a forest and a database, and then attaches the forest to the database.
    (: create a database:)
    xquery version "1.0-ml";
    import module namespace admin = "http://marklogic.com/xdmp/admin" 
    		  at "/MarkLogic/admin.xqy";
    admin:save-configuration(
      admin:database-create(admin:get-configuration(), 
        "es-gs", xdmp:database("Security"), xdmp:database("Schemas")));
        
    (: create a forest :)
    xquery version "1.0-ml";
    import module namespace admin = "http://marklogic.com/xdmp/admin" 
    		  at "/MarkLogic/admin.xqy";
    admin:save-configuration(
      admin:forest-create(admin:get-configuration(), 
        "es-gs", xdmp:host(), ()));
    
    (: attach the forest to the database :)
    xquery version "1.0-ml";
    import module namespace admin = "http://marklogic.com/xdmp/admin" 
    		  at "/MarkLogic/admin.xqy";
    admin:save-configuration(
      admin:database-attach-forest(admin:get-configuration(), 
        xdmp:database("es-gs"), xdmp:forest("es-gs")));
  6. Click the Run button. A database named es-gs is created.
  7. Optionally, confirm the existence of the new database by browsing to the Admin Interface. For example, browse to http://localhost:8001 and observe es-gs in the list of databases.

Getting Started Using XQuery

This section uses XQuery and XML to introduce the Entity Services APIs. If you prefer to use Server-Side JavaScript, see Getting Started Using JavaScript. You can also use JSON with XQuery and XML with JavaScript, but these combinations are not illustrated here.

Stage the Source Data

This exercise ingests the raw source data from which we will create entity instances. One benefit of Entity Services is that you do not have to model your data up front. You can load your data as-is and use it in your application, and then incrementally model your entities.

You usually create entity instances from XML or JSON data. The raw data in this example is 2 XML documents and a JSON document. Each document contains information about a person, such as first name and last name. Each person document also includes a unique persond identifier.

Use the following procedure to load the raw source documents into your content database. The newly created documents are put into a collection named raw so we can easily reference them later.

  1. Navigate to Query Console in your browser. For example, if MarkLogic is installed on localhost, navigate to the following URL:
    http://localhost:8000/qconsole
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select XQuery in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query.
    (: Stage raw source in the form of 2 XML and 1 JSON document :)
    xquery version "1.0-ml";
    import module namespace es = "http://marklogic.com/entity-services"
        at "/MarkLogic/entity-services/entity-services.xqy";
    
    (: Synthesize source data in memory. Normally, this would come
     : from an external source. :)
    let $source-data := (
      <person>
        <pid>1234</pid>
        <given>George</given>
        <family>Washington</family>
      </person>,
      xdmp:unquote('
        {"pid": 2345,
         "given": "Martha",
         "family": "Washington"}
      ')/node(),
      <person>
        <pid>3456</pid>
        <given>Alexander</given>
        <family>Hamilton</family>
      </person>
    )
    for $source in $source-data return
      let $uri-suffix := 
        typeswitch ($source)
        case element() return ".xml"
        case object-node() return ".json"
        default return ()
      return xdmp:document-insert(
        fn:concat('/es-gs/raw/', $source/pid, $uri-suffix),
        $source,
        <options xmlns="xdmp:document-insert">
          <collections>
            <collection>raw</collection>
          </collections>
        </options>
      )
  6. Click the Run button. Three documents are created in the database.
  7. Optionally, click the Explore button and observe that the following documents were created in the raw collection.

    /es-gs/raw/1234.xml /es-gs/raw/2345.json /es-gs/raw/3456.xml

Create a Model Descriptor

You define the entity types, attributes, and relationships of your model in an XML or JSON model descriptor. The model descriptor is the foundation for the model. Model descriptors are discussed in detail in Creating and Managing Models.

The model descriptor in this example is based on the Person example from the Entity Services examples on GitHub. For more details about the original example, see Exploring the Entity Services Open-Source Examples.

This exercise saves an XML model descriptor as a file on the filesystem. Discussion of the descriptor follows the procedure. For an equivalent JSON example, see Create a Model Descriptor.

  1. Choose a filesystem directory on your MarkLogic host to hold the model descriptor file. The exercises in this chapter use ARTIFACT_DIR to represent this location.
  2. Create a text file named person-desc.xml in ARTIFACT_DIR with the following contents.
    <es:model xmlns:es="http://marklogic.com/entity-services">
      <es:info>
        <es:title>Person</es:title>
        <es:version>1.0.0</es:version>
        <es:base-uri>http://example.org/example-person/</es:base-uri>
        <es:description>
          A model of a person, to demonstrate several extractions
        </es:description>
      </es:info>
      <es:definitions>
        <Person>
          <es:properties>
            <id><es:datatype>string</es:datatype></id>
            <firstName><es:datatype>string</es:datatype></firstName>
            <lastName><es:datatype>string</es:datatype></lastName>
            <fullName><es:datatype>string</es:datatype></fullName>
            <friends>
              <es:datatype>array</es:datatype>
              <es:items><es:ref>#/definitions/Person</es:ref></es:items>
            </friends>
          </es:properties>
          <es:primary-key>id</es:primary-key>
          <es:required>firstName</es:required>
          <es:required>lastName</es:required>
          <es:required>fullName</es:required>
        </Person>
      </es:definitions>
    </es:model>
  3. Set the permissions on ARTIFACT_DIR and the newly created file so that MarkLogic can read the file.

You now have a file named ARTIFACT_DIR/person-desc.xml that contains the Person model descriptor.

We stored the model on the filesystem because this most closely resembles a real development cycle, in which an important project artificat like the model descriptor is under source control.

The descriptor defines a single entity type named Person. A Person entity instance contains string-valued properties named id, firstName, lastName, fullName and a list-valued property named friends.

<Person>
  <es:properties>
    <id><es:datatype>string</es:datatype></id>
    <firstName><es:datatype>string</es:datatype></firstName>
    <lastName><es:datatype>string</es:datatype></lastName>
    <fullName><es:datatype>string</es:datatype></fullName>
    <friends>
      <es:datatype>array</es:datatype>
      <es:items><es:ref>#/definitions/Person</es:ref></es:items>
    </friends>
    ...

The friends property is a list (array) of references to other Person entities. Since the reference to Person appears in the same descriptor in which Person is defined, it is a local reference. Entity Services knows the shape of the referenced entity type when generating code from a Person model. You can also reference entity types defined elsewhere.

The firstName, lastName, and fullName properties must all be present in every Person entity instance because these properties are explicitly flagged as required through the use of <es:required/>:

<es:required>firstName</es:required>
<es:required>lastName</es:required>
<es:required>fullName</es:required>

The id property is implicitly required because it is identified as the primary key for a Person:

<es:primary-key>id</es:primary-key>

The primary key is a unique identifier for an entity instance. You are not required to define a primary key, but the existence of a primary key facilitates other Entity Services features; for details, see Identifying the Primary Key Entity Property.

Since the friends property is neither a primary key nor an explicitly required property, it is optional. That is, you can create entities that do not include a friends property.

You can also flag properties with other characteristics, such as whether or not a property should be indexed for efficient search. For more details, see Writing a Model Descriptor.

Create a Model

Inserting an XML or JSON model descriptor document into the special collection http://marklogic.com/entity-services/models tells MarkLogic the document is part of an Entity Services model. Membership in this collection causes MarkLogic to generate semantic triples that define the model.

We authored a model descriptor in Create a Model Descriptor. The following procedure covers the validation and persistence steps that create the model. An explanation of the code follows the procedure.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select XQuery in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code creates a model from a descriptor.
    (: Create a model. :)
    xquery version "1.0-ml";
    import module namespace es = "http://marklogic.com/entity-services"
      at "/MarkLogic/entity-services/entity-services.xqy";
    
    let $ARTIFACT_DIR := '/space/es/gs/'
    let $desc := xdmp:document-get(
      fn:concat($ARTIFACT_DIR, 'person-desc.xml'))
    let $validated-desc := es:model-validate($desc)
    let $desc-as-json := xdmp:to-json($validated-desc)
    return xdmp:document-insert(
      '/es-gs/models/person-1.0.0.json', $desc-as-json,
      <options xmlns="xdmp:document-insert">  
        <collections>{
          <collection>http://marklogic.com/entity-services/models</collection>,
          for $coll in xdmp:default-collections()
          return <collection>{$coll}</collection>
        }</collections>
      </options>
    )
  6. Change the value of the ARTIFACT_DIR variable to the directory where you saved the model descriptor in Create a Model Descriptor. Include the trailing directory separator in the pathname.
  7. Click the Run button. A model is created. The descriptor is persisted as a document with the URI /es-gs/models/person-1.0.0.json.

    If the query is unable to open the input model descriptor file, check the permissions on the directory and file.

  8. Optionally, click the Explore button at the top of the query editor to view the JSON version of the descriptor.

The first step is to validate the descriptor. An invalid descriptor will produce an invalid model. Validation introduces overhead, but an invalid descriptor will produce an invalid model, so validation is recommended during development.

let $desc := xdmp:document-get(
  fn:concat($ARTIFACT_DIR, 'person-desc.xml'))
let $validated-desc := es:model-validate($desc)

The function es:model-validate returns a json:object representation of the descriptor. A json:object is a special kind of map:map. This is the form expected by Entity Services API functions that operate on the model, but it is not the proper form for creating a model. Instead, you must persist an XML or JSON descriptor.

If you persist a descriptor as XML, then you must use es:model-validate or es:model-from-xml to convert it to the map:map form if you extract it from the database to pass to an Entity Services function. If you persist the descriptor as JSON, then subsequent conversion is not necessary. Therefore, this example persists a JSON version of the original XML descriptor.

The function xdmp:to-json converts the json:object created by es:model-validate into a JSON object-node that represents the JSON version of our XML descriptor. For example:

let $desc-as-json := xdmp:to-json($validated-desc)

Finally, we insert the descriptor into the database as part of the special Entity Services collection to create the model. The following document insertion adds the Entity Services collection to any default collections associated with the user performing the insertion.

xdmp:document-insert(
  '/es-gs/models/person-1.0.0.json', $model-as-json,
  <options xmlns="xdmp:document-insert">
    <collections>{
      <collection>http://marklogic.com/entity-services/models</collection>,
      for $coll in xdmp:default-collections()
      return <collection>{$coll}</collection>
    }</collections>
  </options>
)

Create and Deploy an Instance Converter

An instance converter is a library module containing code for transforming your raw source data into entity instances that conform to your model. You can use the Entity Services API to generate a baseline converter, and then customize it to meet the requirements of your application.

This section walks through deploying a converter module in the following steps:

Generate the Default Converter Module

This exercise creates an instance converter module template using the es:instance-converter-generate function. An explanation of the code follows the procedure.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select XQuery in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code generates the instance converter module and saves it to the filesystem.
    (: Create an instance converter and save it to a file :)
    xquery version "1.0-ml";
    import module namespace es = "http://marklogic.com/entity-services"
      at "/MarkLogic/entity-services/entity-services.xqy";
    
    let $desc := fn:doc('/es-gs/models/person-1.0.0.json')
    let $ARTIFACT_DIR := '/space/es/gs/'     (: MODIFY THIS VALUE :)
    return xdmp:save(
      fn:concat($ARTIFACT_DIR, 'person-1.0.0-conv.xqy'),
      es:instance-converter-generate($desc)
    )
  6. Change the value of $ARTIFACT_DIR to a directory on your MarkLogic host where the generated code can be saved. Include the trailing directory separator in the pathname.

    The directory must be readable and writable by MarkLogic.

  7. Click the Run button. The file ARTIFACT_DIR/person-1.0.0-conv.xqy is created.
  8. Optionally, go to ARTIFACT_DIR and review the generated code. In the next section, we will modify this code.

Though the generated code is runnable as-is, you will need to customize the code to match the characteristics of your source data and the requirements of your application. The generated code contains extensive comments to assist you with customization.

We could insert the converter module directly into the modules database to which it will eventually be deployed. However, the converter is an important project artifact, so you would normally save it to a file and place it under source control before proceeding with customizations.

The generated module defines the following externally visible functions, plus some private helper functions. The namespace prefix defined for the module is derived from the model title.

  • person:extract-instance-Person - Create a Person instance from raw source data. The returned instance is a json:object (map:map). You are expected to customize this function to harmonize your source data with your model.
  • person:instance-to-envelope - Convert an entity instance into an XML or JSON envelope document that encapsulates the instance and the original source. Most applications will use this function as-is, but you might customize it if you include additional data in the envelope.
  • person:instance-to-canonical - Convert the map:map representation of an instance into its canonical XML or JSON representation. You will not usually need to customize this function or call it directly; it exists for use by the generated instance-to-envelope function.

For more details, see Creating an Instance Converter Module.

Customize the Converter Module

The converter module generated by Entity Services implements a modeltitle:extract-instance-T function for each entity type T defined in the descriptor. In our example, the converter module implements a person:extract-instance-Person function.

The default implementation of an instance converter assumes the source data has the same shape as a Person entity. However, our source data has pid, given, and family properties instead of id, firstName, lastName, and fullName. You must modify person:extract-instance-Person to do the following:

  • Extract id from pid
  • Extract firstName from given
  • Extract lastName from family
  • Synthesize fullName by concatenating given and family

Production applications can require many other types of customizations. For example, you might need to normalize a date value, perform a more sophisticated type conversion, or extract the value of an entity property from somewhere other than the source data.

Use the following procedure to customize the instance extraction code as described. A discussion of the code follows the procedure.

  1. Confirm you have read and write permissions on ARTIFACT_DIR/person-1.0.0-conv.xqy. If not, set the permissions accordingly. The file must also be readable by MarkLogic.
  2. Open ARTIFACT_DIR/person-1.0.0-conv.xqy in the text editor of your choice.
  3. Locate the section of person:extract-instance-Person that prepares the value of the id, firstName, lastName, and fullName properties. The code should look similar to the following:
    let $id  :=       $source-node/id ! xs:string(.)
    let $firstName := $source-node/firstName ! xs:string(.)
    let $lastName  := $source-node/lastName ! xs:string(.)
    let $fullName  := $source-node/fullName ! xs:string(.)
  4. Replace these lines with the following code. The bold text highlights the changes.
    let $id  :=       $source-node/pid ! xs:string(.)
    let $firstName := $source-node/given ! xs:string(.)
    let $lastName  := $source-node/family ! xs:string(.)
    let $fullName  := fn:concat($firstName, " ", $lastName) ! xs:string(.)
  5. Save your changes.

Recall that the Person entity type has id, firstName, lastName, fullName, and friends properties. The default implementation of person:extract-instance-Person assumes the source data contains the same properties. For example, the default implementation includes the following code:

let $id  := $source-node/id ! xs:string(.)
let $firstName := $source-node/firstName ! xs:string(.)
let $lastName := $source-node/lastName ! xs:string(.)
let $fullName := $source-node/fullName ! xs:string(.)

Our customization changes the names of the source fields to match our source data, and derives the fullName property from the given and family source values. The modified portions are shown in bold, below.

let $id  :=       $source-node/pid ! xs:string(.)
let $firstName := $source-node/given ! xs:string(.)
let $lastName  := $source-node/family ! xs:string(.)
let $fullName  := fn:concat($firstName, " ", $lastName) ! xs:string(.)
Deploy the Converter Module

Like any application code, the converter module must be deployed to MarkLogic before you can use it. Best practice is to install it in the modules database of your App Server. Our example uses the pre-defined App Server on port 8000, which is configured to use the Modules database.

The following procedure uses XQuery to install the customized converter module into the Modules database. You could also use Server-Side JavaScript or the REST, Java, or Node.js Client APIs for this task.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select XQuery in the Query Type dropdown.
  4. Select the Modules database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code saves the instance converter module to the database.
    xquery version "1.0-ml";
    let $ARTIFACT_DIR := '/space/es/gs/'     (: MODIFY THIS VALUE :)
    return xdmp:document-load(
      fn:concat($ARTIFACT_DIR, 'person-1.0.0-conv.xqy'),
      <options xmlns="xdmp:document-load">
        <uri>/es-gs/person-1.0.0-conv.xqy</uri>
      </options>
    )
  6. Modify the value of $ARTIFACT_DIR to the directory where you previously saved the converter module. Include the trailing directory separator in the pathname.
  7. Click the Run button. The converter module is inserted into the Modules database.
  8. Optionally, click the Explore button to confirm the presence of the module in the database.

Create Entity Instances

An envelope document is the recommended way to persist and interact with entity instances in MarkLogic. An envelope document encapsulates an entity instance with model metadata and the original source. Storing the logical aspects of an entity (canonical instance representation, metadata, source) in one physical document facilitates managing, searching, retrieving, indexing, and securing your data.

An envelope document enables your application to query data as harmonized instances, but still recover the raw source when needed. You can generate either XML or JSON envelope documents.

You can use the person:instance-to-envelope function in the converter module to create entity envelope documents. The input is an instance created by calling person:extract-instance-Person. If you do not explicitly specify an envelope format of xml or json, the function generates an XML envelope.

Use the following procedure to create XML envelope documents from the source documents loaded in Stage the Source Data. Discussion of the code follows the procedure.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select XQuery in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code creates a Person entity envelope XML document from each source document.
    (: Create envelope documents from raw source documents :)
    xquery version "1.0-ml";
    import module namespace es = "http://marklogic.com/entity-services"
        at "/MarkLogic/entity-services/entity-services.xqy";
    import module namespace person =
        "http://example.org/example-person/Person-1.0.0"
        at "/es-gs/person-1.0.0-conv.xqy";
    
    for $source in fn:collection('raw') return
      let $instance := person:extract-instance-Person($source)
      let $uri := 
        fn:concat('/es-gs/env/', map:get($instance, 'id'), '.xml')
      return xdmp:document-insert(
        $uri,
        person:instance-to-envelope($instance, "xml"),
        <options xmlns="xdmp:document-insert">
          <collections>
            <collection>person-envelopes</collection>
          </collections>
        </options>
      )
  6. Click the Run button. The following envelope documents are created in your content database:

    /es-gs/env/1234.xml /es-gs/env/2345.xml /es-gs/env/3456.xml

  7. Optionally, click the Explore button to confirm creation of the envelope documents.

An envelope document can be either XML or JSON. This exercise uses XML envelopes. An XML envelope has the following form. The es:attachments portion of the envelope holds the raw source data.

<es:envelope xmlns:es="http://marklogic.com/entity-services">
  <es:instance>
    <es:info>metadata from info section of descriptor</es:info>
    ...instance canonical XML..
  </es:instance>
  <es:attachments>
    source data
  </es:attachments>
</es:envelope>

The equivalent JSON envelope, generated by passing "json" as the second parameter of person:instance-to-envelope, has the following form:

{ "envelope": {
  "instance": {
    "info": { ...metadata from info section of descriptor... },
    ...instance canonical JSON...
  },
  "attachments": [ ...source data... ]
}}

Except when constructing path expressions, you do not usually have to be aware of the internal structure of an envelope document because the Entity Services API includes functions for extracting an instance or the attachments from an envelope document handle it for you. For details, see Extracting an Entity Instance from an Envelope Document and Extracting the Original Source from an Envelope Document.

You create an envelope document for some entity type T and envelope format F using the extract-instance-T and instance-to-envelope functions of the instance converter. For example:

(: creating an XML envelope :)
modeltitle:instance-to-envelope(
  modeltitle:extract-instance-T($source), "xml")

(: creating a JSON envelope :)
modeltitle:instance-to-envelope(
  modeltitle:extract-instance-T($source), "json")

For example, the sample code does the following to create a Person entity XML envelope:

let $instance := person:extract-instance-Person($source)
...
return xdmp:document-insert(
  $uri,
  person:instance-to-envelope($instance, "xml"),
  ...)

Inside person:instance-to-envelope, the person:instance-to-canonical function is called to create the Person entity embedded inside es:envelope/es:instance.

The table below illustrates the progression from raw data to XML envelope document, through use of the instance converter module functions.

Operation Result
ingest raw source
<person>
  <pid>1234</pid>
  <given>George</given>
  <family>Washington</family>
</person>
extract-instance-Person($source)

input: raw source
output: a map:map (json:object), shown here serialized as JSON
{"$attachments": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<person>\n  <pid>1234</pid>\n  <given>George</given>\n  <family>Washington</family>\n</person>", 
  "$type": "Person", 
  "id": "1234", 
  "firstName": "George", 
  "lastName": "Washington", 
  "fullName": "George Washington"
}
instance-to-canonical($instance, "xml")

input: instance map:map
output: XML elem
<Person>
  <id>1234</id>
  <firstName>George</firstName>
  <lastName>Washington</lastName>
  <fullName>George Washington</fullName>
</Person>
instance-to-envelope($instance, "xml")

input: instance map:map
output: XML envelope doc
<es:envelope
    xmlns:es="http://marklogic.com/entity-services">
  <es:instance>
    <es:info>
      <es:title>Person</es:title>
      <es:version>1.0.0</es:version>
    </es:info>
    <Person>
      <id>1234</id>
      <firstName>George</firstName>
      <lastName>Washington</lastName>
      <fullName>George Washington</fullName>
    </Person>
  </es:instance>
  <es:attachments>
    <person>
      <pid>1234</pid>
        <given>George</given>
        <family>Washington</family>
    </person>
  </es:attachments>
</es:envelope>

The following is an equivalent JSON envelope, generated by calling instance-to-envelope($instance, "json"):

{ "envelope": {
  "instance": {
    "info": {
      "title":"Person", 
      "version":"1.0.0"
    }, 
    "Person": {
      "id":"2345", 
      "firstName":"Martha", 
      "lastName":"Washington", 
      "fullName":"Martha Washington"}
    }, 
    "attachments":[
      "<person><pid>2345</pid><given>Martha</given><family>Washington</family></person>"
    ]
}}

Note that the source data in the attachments is represented as a string if it does not match the envelope data format. For example, in the above JSON envelope, the source attachment is a string, rather than an XML node. This has implications for extracting the source from the envelope as a node; see the example in Query the Data.

Query the Data

This section illustrates one way to search your entity instance data using the cts:search XQuery function. You can also use other MarkLogic document search APIs, search your instances as row data, or use semantic search. The Entity Services API includes tools to facilitate all these forms of search. For details, see Querying a Model or Entity Instances.

The following example uses the XQuery cts:query API to find all Person entities with a lastName property of Washington, and then emits the original source from which the entity was derived.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select XQuery in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. The code matches documents in the person-envelopes collection where the lastName element has the value washington, and then returns the original source data from the envelope.
    xquery version "1.0-ml";
    import module namespace es = "http://marklogic.com/entity-services"
      at "/MarkLogic/entity-services/entity-services.xqy";
    
    (: match all envelopes containing an entity instances with
     : a lastName property value of 'washington' :)
    let $matches := cts:search(
      fn:collection('person-envelopes'),
      cts:element-query(
        fn:QName('http://marklogic.com/entity-services', 'instance'),
        cts:element-value-query(xs:QName('lastName'), 'washington')
      ))
    (: extract the original source, as a node :)
    for $attachment in $matches/es:envelope/es:attachments/node()
    return typeswitch ($attachment)
      case element() return $attachment
      case text() return xdmp:from-json-string($attachment)
      default return ()
  6. Click the Run button. The query returns a JSON node and an XML node similar to the following:
    { "pid":2345, 
      "given":"Martha", 
      "family":"Washington" }
    
    <person xmlns:es="http://marklogic.com/entity-services">
      <pid>1234</pid>
      <given>George</given>
      <family>Washington</family>
    </person>

The search matches two entity instances, one extracted from JSON source and one extracted from XML source, so final query results are one JSON node and one XML node.

The search is limited to the envelope documents by specifying the person-envelopes collection. A container query (cts:element-query) further constrains the search to occurrences within the es:instance portion of an envelope document. Finally, a cts:element-value-query is used to match envelopes where the lastName property value is washington.

cts:search(
  fn:collection('person-envelopes'),
  cts:element-query(
    fn:QName('http://marklogic.com/entity-services', 'instance'),
    cts:element-value-query(xs:QName('lastName'), 'washington')
  ))

The container query ensures the search will not find matches in any part of the envelope document except the entity instance. You could similarly search just the es:attachments, but remember that you cannot perform a structured search on JSON source in the attachments because it is stored in the envelope document as a string.

Notice that the example code can return the original XML source data directly out of the envelope document, but the original JSON document must be converted from a string to a JSON node using xdmp:from-json-string, if you want to return it as a node.

Query the Model

When you created a model in Create a Model, MarkLogic automatically generated some facts from the persisted descriptor, as semantic triples. These facts (and any additional facts you add) define the model and enable semantic queries against the model.

For example, you can use a SPARQL query to discover what entity types are defined by a model, what properties are required in an entity instance of a particular type, or the datatype of a particular entity type property. For more details, see Querying a Model or Entity Instances.

The following procedure uses a SPARQL query to generate a list of all the required properties of an instance of the Person entity type:

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select SPARQL Query in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code retrieves the names of all required properties of a Person entity instance.
    prefix es:<http://marklogic.com/entity-services#>
    select ?ptitle
    where {
      ?x a es:EntityType;
           es:title "Person";
           es:property ?property .
      ?property a es:RequiredProperty;
                  es:title ?ptitle
    }
  6. Click the Run button. The query results are displayed as a table.

You should see results similar to the following:

ptitle
"lastName"
"fullName"
"firstName"

You can also use the SQL and Optic APIs to query your model and entities as rows if you install an Entity Services generated TDE template based on your model. For more details and examples, see Querying a Model or Entity Instances. To learn more about Semantics in MarkLogic Server, see the Semantics Developer's Guide.

Getting Started Using JavaScript

This section uses Server-Side JavaScript and JSON to introduce the Entity Services APIs. If you prefer to use XQuery, see Getting Started Using XQuery. You can also use JSON with XQuery and XML with JavaScript, but these combinations are not illustrated here.

Stage the Source Data

This exercise ingests the raw source data from which we will create entity instances. One benefit of Entity Services is that you do not have to model your data up front. You can load your data as-is and use it in your application, and then incrementally model your entities.

You usually create entity instances from XML or JSON data. The raw data in this example is 2 XML documents and a JSON document. Each document contains information about a person, such as first name and last name. Each person document also includes a unique persond identifier.

Use the following procedure to load the raw source documents into your content database. The newly created documents are put into a collection named raw so we can easily reference them later.

  1. Navigate to Query Console in your browser. For example, if MarkLogic is installed on localhost, navigate to the following URL:
    http://localhost:8000/qconsole
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select JavaScript in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query.
    'use strict';
    declareUpdate();
    
    // Synthesize source data in memory. This would normally come
    // from an external source.
    const sourceData = [
      fn.head(xdmp.unquote(
        '<person>' +
          '<pid>1234</pid>' +
          '<given>George</given>' +
          '<family>Washington</family>' +
        '</person>')),
      {pid: 2345,
       given: 'Martha',
       family: 'Washington'},
      fn.head(xdmp.unquote(
        '<person>' +
          '<pid>3456</pid>' +
          '<given>Alexander</given>' +
          '<family>Hamilton</family>' +
        '</person>'))
    ];
    
    // Insert each source item into the db as an XML or JSON doc.
    sourceData.forEach(function(source) {
      let uri = '/es-gs/raw/';
      if (source instanceof Document) {
        // XML doc created by xdmp.unquote
        uri += source.xpath('/node()/pid/data()') + '.xml';
      } else if (source instanceof Object) {
        uri += source.pid + '.json';
      }
      xdmp.documentInsert(uri, source, {collections: ['raw']});
    });
  6. Click the Run button. Three documents are created in the database.
  7. Optionally, click the Explore button and observe that the following documents were created in the raw collection.

    /es-gs/raw/1234.xml /es-gs/raw/2345.json /es-gs/raw/3456.xml

The sourceData array, above, creates raw data in a very artifical way in order to have a self-contained example. Your source data will normally come from an external source, such as files on the file system, an HTTP request payload, or an mlcp job.

Part of this artificiality is the use of xdmp.unquote as quick way to create an XML node from a literal. You would normally use NodeBuilder to create in-memory XML documents from Server-Side JavaScript.

Create a Model Descriptor

You define the entity types, entity type properties, and relationships of your model in an XML or JSON model descriptor. The model descriptor is the staring point for creating a model. Model descriptors are discussed in detail in Creating and Managing Models.

The model descriptor in this example is based on the Person example from the Entity Services examples on GitHub. For more details about the original example, see Exploring the Entity Services Open-Source Examples.

This exercise saves a JSON model descriptor as a file on the filesystem. Discussion of the descriptor follows the procedure.

  1. Choose a filesystem directory on your MarkLogic host to hold the model descriptor file. The exercises in this chapter use ARTIFACT_DIR to represent this location.
  2. Create a text file named file person-desc.json in ARTIFACT_DIR with the following contents.
    { "info": {
        "title": "Person", 
        "version": "1.0.0", 
        "baseUri": "http://example.org/example-person/", 
        "description":
          "A model of a person, to demonstrate several extractions"
      }, 
      "definitions": {
        "Person": {
          "properties": {
            "id": {"datatype": "string"}, 
            "firstName": {"datatype": "string"}, 
            "lastName": {"datatype": "string"}, 
            "fullName": {"datatype": "string"}, 
            "friends": {
              "datatype": "array", 
              "items": {"$ref": "#/definitions/Person"
            }
          }}, 
          "primaryKey": "id", 
          "required": ["firstName", "lastName", "fullName"]
        }
      }
    }
  3. Set the permissions on ARTIFACT_DIR and the newly created file so that MarkLogic can read the file.

You now have a file named ARTIFACT_DIR/person-desc.json that contains the Person model descriptor. For an example of the equivalent XML descriptor, see Create a Model Descriptor in the XQuery walkthrough.

We stored the model on the filesystem because this most closely resembles a real development cycle, in which an important project artificat like the model descriptor is under source control.

The descriptor defines a single entity type named Person. A Person entity instance contains string-valued properties named id, firstName, lastName, fullName and list-valued property named friends.

"Person": {
  "properties": {
    "id": {"datatype": "string"}, 
    "firstName": {"datatype": "string"}, 
    "lastName": {"datatype": "string"}, 
    "fullName": {"datatype": "string"}, 
    "friends": {
      "datatype": "array", 
      "items": {"$ref": "#/definitions/Person"
    }
}}, ...

The friends property is a list (array) of references to other Person entities. Since the reference to Person appears in the same descriptor in which Person is defined, it is a local reference. Entity Services knows the shape of the referenced entity type when generating code from a Person model. You can also reference entity types defined elsewhere.

The firstName, lastName, and fullName properties must be present in every Person entity instance because these properties are explicitly flagged as required through the required descriptor property:

"required": ["firstName", "lastName", "fullName"]

The id property is implicitly required because it is identified as the primary key for a Person:

"primaryKey":"id"

The primary key is a unique identifier for an entity instance. You are not required to define a primary key, but the existence of a primary key facilitates other Entity Services features; for details, see Identifying the Primary Key Entity Property.

Since the friends property is neither a primary key nor an explicitly required property, it is optional. That is, you can create Person entities that do not include a friends property.

You can also flag properties with other characteristics, such as whether or not a property should be indexed for efficient search. For more details, see Writing a Model Descriptor.

Create a Model

Inserting an XML or JSON model descriptor document into the special collection http://marklogic.com/entity-services/models tells MarkLogic the document is part of an Entity Services model. Membership in this collection causes MarkLogic to generate semantic triples that define the model.

We authored a model descriptor in Create a Model Descriptor. The following procedure covers the validation and persistence steps that create the model. An explanation of the code follows the procedure.

The following procedure creates a model using the Person model descriptor. An explanation of the code follows the procedure.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select JavaScript in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code creates a model from a descriptor.
    'use strict';
    declareUpdate();
    const es = require('/MarkLogic/entity-services/entity-services.xqy');
    
    // Retrieve descriptor from filesystem
    const ARTIFACT_DIR = '/space/es/gs/';      // CHANGE THIS VALUE
    const desc = fn.head(
      xdmp.documentGet(ARTIFACT_DIR + 'person-desc.json'));
    
    // Create the model
    xdmp.documentInsert(
      '/es-gs/models/person-1.0.0.json', es.modelValidate(desc),
      {collections: ['http://marklogic.com/entity-services/models']}
    );
  6. Change the value of the ARTIFACT_DIR variable to the directory where you saved the model descriptor in Create a Model Descriptor. Include the trailing directory separator in the pathname.
  7. Click the Run button. A model is created. The descriptor portion is persisted as a document with the URI /es-gs/models/person-1.0.0.json.

    If the query is unable to open the model descriptor file, check the permissions on the directory and file.

  8. Optionally, click the Explore button at the top of the query editor to view the descriptor document in the database.

The model is created by persisting the descriptor as part of the collection http://marklogic.com/entity-services/models.

xdmp.documentInsert(
  '/es-gs/models/person-1.0.0.json', es.modelValidate(desc),
  {collections: ['http://marklogic.com/entity-services/models']}
);

The example also uses es.modelValidate to check the descriptor for errors before inserting it. An invalid descriptor will generate an invalid model. If the descriptor is invalid, es.modelValidate throws an exception. If you know your model descriptor is valid, you can skip validation. Skipping validation is faster, but validation is recommended during development.

Create and Deploy an Instance Converter

An instance converter is an XQuery library module containing code for transforming your raw source data into entity instances that conforms to your model. You can use the Entity Services API to generate a baseline converter, and then customize it to meet the requirements of your application.

This section walks through deploying a converter module in the following steps:

Generate the Default Converter Module

This exercise creates an instance converter module template using the es.instanceConverterGenerate function. An explanation of the code follows the procedure.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select JavaScript in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code generates the instance convert module and saves it to the filesystem.
    'use strict';
    const es = require('/MarkLogic/entity-services/entity-services.xqy');
    
    const ARTIFACT_DIR = '/space/es/gs/';       // CHANGE THIS VALUE
    const desc = cts.doc('/es-gs/models/person-1.0.0.json');
    xdmp.save(
      ARTIFACT_DIR + 'person-1.0.0-conv.xqy',
      es.instanceConverterGenerate(desc)
    );
  6. Change the value of ARTIFACT_DIR to a directory on your MarkLogic host where the generated code can be saved. Include the trailing directory separator in the pathname.

    The directory must be readable and writable by MarkLogic.

  7. Click the Run button. The file ARTIFACT_DIR/person-1.0.0-conv.xqy is created.
  8. Optionally, go to ARTIFACT_DIR and review the generated code. In the next section, we will modify this code.

We could have inserted the converter module directly into the modules database to which it will eventually be deployed. However, the converter is an important project artifact, so you would normally save it to a file and place it under source control. Also, most applications will require converter customizations.

The generated code is runnable as-is, but you are expected to customize the code to match the characteristics of your source data and the requirements of your application. The generated code contains comments to assist you with customization. You will need to understand some XQuery to customize the converter for a production application.

The generated module defines the following functions. The namespace prefix defined for the module is derived from the model title.

  • person:extract-instance-Person - Create a Person instance from raw source data. You are expected to customize this function to harmonize your source data with your model.
  • person:instance-to-envelope - Convert an entity instance into an XML or JSON envelope document that encapsulates the instance and the original source. Most applications will use this function as-is, but you might customize if you include additional data in the envelope.
  • person:instance-to-canonical - Convert the JSON object representation of an instance into its canonical XML or JSON representation. You will not usually need to customize this function or call it directly; it exists for use by the generated instance-to-envelope function.

As with any XQuery module in MarkLogic, you can use the instance converter module from Server-Side JavaScript, once you install the module. Bring the module into scope using a require statement. For example, if the module is installed in the modules database with the URI /es-gs/person-1.0.0-conv.xqy, then use a require statement such as the following:

const person = require('/es-gs/person-1.0.0-conv.xqy');

Invoke the functions using their JavaScript-style, camel-case names. For example, in the case of the Person entity type, the module converter functions can be invoked from Server-Side JavaScript using the following names, assuming the module is represented by a variable named person, as shown in the above require statement.

person.extractInstancePerson
person.instanceToEnvelope
person.instanceToCanonical

For more details, see Creating an Instance Converter Module.

Customize the Converter Module

The converter module generated by Entity Services implements a modeltitle:extract-instance-T function for each entity type T defined in the descriptor. In our example, the converter module implements a person:extract-instance-Person function.

The default implementation of an instance converter assumes the source data has the same shape as a Person entity. However, our source data has pid, given, and family properties instead of id, firstName, lastName, and fullName. You must modify person:extract-instance-Person to do the following:

  • Extract id from pid
  • Extract firstName from given
  • Extract lastName from family
  • Synthesize fullName by concatenating family and given

Production applications can require many other types of customizations. For example, you might need to normalize a date value, perform a more sophisticated type conversion, or extract the value of an entity property from somewhere other than the source data.

Use the following procedure to customize the instance extraction code. A discussion of the code follows the procedure.

  1. Confirm you have read and write permissions on ARTIFACT_DIR/person-1.0.0-conv.xqy. If not, set the permissions accordingly. The file must also be readable by MarkLogic.
  2. Open ARTIFACT_DIR/person-1.0.0-conv.xqy in the text editor of your choice.
  3. Locate the section of person:extract-instance-Person that sets the value of the id, firstName, lastName, and fullName properties. The code should look similar to the following:
    let $id  :=       $source-node/id ! xs:string(.)
    let $firstName := $source-node/firstName ! xs:string(.)
    let $lastName  := $source-node/lastName ! xs:string(.)
    let $fullName  := $source-node/fullName ! xs:string(.)
  4. Replace these lines with the following code. The text in bold highlights the changes.
    let $id  :=       $source-node/pid ! xs:string(.)
    let $firstName := $source-node/given ! xs:string(.)
    let $lastName  := $source-node/family ! xs:string(.)
    let $fullName  := fn:concat($firstName, " ", $lastName) ! xs:string(.)
  5. Save your changes.

Recall that the Person entity type has id, firstName, lastName, fullName, and friends properties. The default implementation of person:extract-instance-Person assumes the source data contains the same properties. For example, the default implementation includes the following code:

let $id  :=       $source-node/id ! xs:string(.)
let $firstName := $source-node/firstName ! xs:string(.)
let $lastName  := $source-node/lastName ! xs:string(.)
let $fullName  := $source-node/fullName ! xs:string(.)

Each of the variable declarations assumes the value of a property in the new entity instance ($instance) is the value of a property with the same name in the source node. Since that assumption does not match the example model, customization is required.

Our customization changes the names of the source fields to match our source data, and derives the fullName property value from the given and family source values. The modified portions are shown in bold, below.

let $id  :=       $source-node/pid ! xs:string(.)
let $firstName := $source-node/given ! xs:string(.)
let $lastName  := $source-node/family ! xs:string(.)
let $fullName  := fn:concat($firstName, " ", $lastName) ! xs:string(.)
Deploy the Converter Module

Like any application code, the converter module must be deployed to MarkLogic before you can use it. Best practice is to install it in the modules database of your App Server. Our example uses the pre-defined App Server on port 8000, which is configured to use the Modules database.

The following procedure uses XQuery to install the customized converter module into the Modules database. You could also use Server-Side JavaScript or the REST, Java, or Node.js Client APIs for this task.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select JavaScript in the Query Type dropdown.
  4. Select the Modules database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code saves the instance converter module to the database.
    // ** RUN AGAINST MODULES DB **
    'use strict';
    declareUpdate();
    
    const ARTIFACT_DIR = '/space/es/gs/';       // CHANGE THIS VALUE
    xdmp.documentLoad(
      ARTIFACT_DIR + 'person-1.0.0-conv.xqy',
      { uri: '/es-gs/person-1.0.0-conv.xqy' }
    );
  6. Modify the value of ARTIFACT_DIR to the directory where you previously saved the converter module. Include the trailing directory separator in the pathname.
  7. Click the Run button. The converter module is inserted into the Modules database.
  8. Optionally, click the Explore button to confirm the presence of the module in the database.

Create Entity Instances

An envelope document is the recommended way to persist and interact with entity instances in MarkLogic. An envelope document encapsulates an entity instance with model metadata and the original source. Storing the logical aspects of an entity (canonical instance representation, metadata, source) in one physical document facilitates managing, searching, retrieving, indexing, and securing your data.

An envelope document enables your application to query data as harmonized instances, but still recover the raw source when needed. You can generate either XML or JSON envelope documents.

You can use the person.instanceToEnvelope function in the converter module to create entity envelope documents. The input is an instance created by calling person.extractInstancePerson. If you do not explicitly specify an envelope format of xml or json, the function generates an XML envelope.

Use the following procedure to create envelope documents from the source documents loaded in Stage the Source Data. Discussion of the code follows the procedure.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select JavaScript in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code creates a Person entity envelope document from each source document.
    'use strict';
    declareUpdate();
    const es = require('/MarkLogic/entity-services/entity-services.xqy');
    const person = require('/es-gs/person-1.0.0-conv.xqy');
    
    for (const source of fn.collection('raw')) {
      let instance = person.extractInstancePerson(source);
      let uri = '/es-gs/env/' + instance.id + '.xml';
      xdmp.documentInsert(
        uri, person.instanceToEnvelope(instance, "xml"),
        {collections: ['person-envelopes']}
      );
    }
  6. Click the Run button. The following envelope documents are created in your content database:

    /es-gs/env/1234.xml /es-gs/env/2345.xml /es-gs/env/3456.xml

  7. Optionally, click the Explore button to confirm creation of the envelope documents.

An envelope document can be either XML or JSON. This exercise uses XML envelopes. An XML envelope has the following form. The es:attachments portion of the envelope holds the raw source data.

<es:envelope xmlns:es="http://marklogic.com/entity-services">
  <es:instance>
    <es:info>metadata from info section of descriptor</es:info>
    ...instance canonical XML..
  </es:instance>
  <es:attachments>
    source data
  </es:attachments>
</es:envelope>

The equivalent JSON envelope, generated by passing "json" as the second parameter of person.instanceToEnvelope, has the following form:

{ "envelope": {
  "instance": {
    "info": { ...metadata from info section of descriptor... },
    ...instance canonical JSON...
  },
  "attachments": [ ...source data... ]
}}

Except when constructing path expressions, you do not usually have to be aware of the internal structure of an envelope document because the Entity Services API includes functions for extracting an instance or the attachments from an envelope document handle it for you. For details, see Managing Entity Instances.

You create an envelope document for some entity type T using the extractInstanceT and instanceToEnvelope functions of the instance converter. (These are the extract-instance-T and instance-to-envelope functions in the XQuery module.) For example:

modeltitle.instanceToEnvelope(
  modeltitle.extractInstanceT($source))

For example, the sample code does the following to create a Person entity envelope:

let instance = person.extractInstancePerson(source);
...
xdmp.documentInsert(
  uri, person.instanceToEnvelope(instance, "xml"),
  ...)

Inside person.instanceToEnvelope, the person.instanceToCanonical function is called to create the Person entity embedded inside es:envelope/es:instance.

The table below illustrates the progression from raw data to XML envelope document, through use of the instance converter module functions.

Operation Result
ingest raw source
{
  "pid": 2345, 
  "given": "Martha", 
  "family": "Washington"
}
extractInstancePerson(source)

input: raw source
output: a map:map (json:object), shown here serialized as JSON
{"$attachments": {\"pid\":2345, \"given\":\"Martha\", \"family\":\"Washington\"}", 
  "$type": "Person", 
  "id": "2345", 
  "firstName": "Martha", 
  "lastName": "Washington", 
  "fullName": "Martha Washington"
}
instanceToCanonical(instance, "xml")

input: instance map:map
output: XML elem
<Person>
  <id>2345</id>
  <firstName>Martha</firstName>
  <lastName>Washington</lastName>
  <fullName>Martha Washington</fullName>
</Person>
instanceToEnvelope(instance, "xml")

input: instance map:map
output: XML envelope doc
<es:envelope
    xmlns:es="http://marklogic.com/entity-services">
  <es:instance>
    <es:info>
      <es:title>Person</es:title>
      <es:version>1.0.0</es:version>
    </es:info>
    <Person>
      <id>2345</id>
      <firstName>Martha</firstName>
      <lastName>Washington</lastName>
      <fullName>Martha Washington</fullName>
    </Person>
  </es:instance>
  <es:attachments>{"pid":2345, "given":"Martha", "family":"Washington"}</es:attachments>
</es:envelope>

The following is an equivalent JSON envelope, generated by calling instanceToEnvelope(instance, "json"):

{ "envelope": {
  "instance": {
    "info": {
      "title":"Person", 
      "version":"1.0.0"
    }, 
    "Person": {
      "id":"2345", 
      "firstName":"Martha", 
      "lastName":"Washington", 
      "fullName":"Martha Washington"}
    }, 
    "attachments":[
      "<person><pid>2345</pid><given>Martha</given><family>Washington</family></person>"
    ]
}}

Note that the source data in the attachments is as a string if it does not match the envelope data format. For example, in the above JSON envelope, the source attachment is a string, rather than an XML node. This has implications for extracting the source from the envelope as a node; see the example in Query the Data.

Query the Data

This section illustrates one way to search your entity instance data, using the JSearch API. You can also use other MarkLogic document search APIs, search your instances as row data, or use semantic search. The Entity Services API includes tools to facilitate all these forms of search. For details, see Querying a Model or Entity Instances.

The following example uses the JSearch API to find all Person entities with a lastName property of Washington.

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select JavaScript in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. The code matches documents in the person-envelopes collection where the es:instance element includes a lastName element with the value washington, and then returns the original source data from the envelope.
    'use strict';
    import jsearch from '/MarkLogic/jsearch.mjs';
    
    // Find all occurences of lastName with the value 'washington' contained
    // in an es:instance element. Return just the documents in the results.
    const people = jsearch.collections('person-envelopes');
    const matches = people.documents()
      .where(cts.elementQuery(
          fn.QName('http://marklogic.com/entity-services', 'instance'),
          cts.elementValueQuery('lastName', 'washington')))
      .map(match => match.document)
      .result();
    
    // Extract the raw source data from the search results, 
    // as XML or JSON nodes
    const asNodes = [];
    for (let match of matches.results) {
      let attachment = fn.head(match.xpath('//*:attachments/node()'));
      if (attachment instanceof Element) {
        // already an XML node
        asNodes.push(attachment);
      } else {
        // serialized JSON; deserialize to a JSON document node
        asNodes.push(fn.head(xdmp.unquote(attachment)));
      }
    }
    // Dump the results in Query Console. The conversion from array
    // to Sequence is just used to finesse the way QC renders array
    // items that are XML nodes. It is not functionally significant.
    Sequence.from(asNodes);
  6. Click the Run button. You should see results similar to the following:
    { "pid":2345, 
      "given":"Martha", 
      "family":"Washington" }
    
    <person xmlns:es="http://marklogic.com/entity-services">
      <pid>1234</pid>
      <given>George</given>
      <family>Washington</family>
    </person>

The search matches two envelope documents, one extracted from JSON source and one extracted from XML source.

The search is first constrained to documents in the person-envelopes collection. Then a container query (cts.elementQuery) further constrains matches to those contained in an es:instance element. Finally, a value query (cts.elementValueQuery) is used to find elements named lastName with the value 'washington'.

const people = jsearch.collections('person-envelopes');
const matches = people.documents()
  .where(cts.elementQuery(
      fn.QName('http://marklogic.com/entity-services', 'instance'),
      cts.elementValueQuery('lastName', 'washington')))
  ...

The container query ensures the search will not find matches in any part of the envelope data except the instance. You could similarly search just the attachments, though you cannot effectively perform a structured search on raw JSON data this way because JSON source is stored in the XML envelope document as a serialized string.

The map feature of JSearch is used to just return the matched documents, eliminating the search metadata such as the URI, relevance score, and confidence. The mapper was used just to streamline the output; a mapper is not required by Entity Services or the JSearch API.

people.documents()
  .where(...)
  .map(match => match.document)

The search produces the following output, which we saved to the matches variable for subsequent processing.

{"results":[
  <es:envelope xmlns:es="http://marklogic.com/entity-services">
    <es:instance>
      <es:info>
        <es:title>Person</es:title>
        <es:version>1.0.0</es:version>
      </es:info>
      <Person>
        <id>2345</id>
        <firstName>Martha</firstName>
        <lastName>Washington</lastName>
        <fullName>Martha Washington</fullName>
      </Person>
    </es:instance>
    <es:attachments>{"pid":2345, "given":"Martha", "family":"Washington"}</es:attachments>
  </es:envelope>
  <es:envelope xmlns:es="http://marklogic.com/entity-services">
    <es:instance>
      <es:info>
        <es:title>Person</es:title>
        <es:version>1.0.0</es:version>
      </es:info>
      <Person>
        <id>1234</id>
        <firstName>George</firstName>
        <lastName>Washington</lastName>
        <fullName>George Washington</fullName>
      </Person>
    </es:instance>
    <es:attachments>
      <person>
        <pid>1234</pid>
        <given>George</given>
        <family>Washington</family>
      </person>
    </es:attachments>
  </es:envelope>
  ], 
  "estimate":2
}

Note that the example code can return the original XML source data directly out of the envelope document because the attachments contain an XML element node. However, the original JSON source data must be converted from a string to a JSON node using xdmp:from-json-string, if you want to work with it as structured data. This conversion is the purpose of the following section of code:

if (attachment instanceof Element) {
  // already an XML node
  asNodes.push(attachment);
} else {
  // serialized JSON; deserialize to a JSON document node
  asNodes.push(fn.head(xdmp.fromJsonString(attachment)));
}

(The accumulation of the attachments into the asNodes array and subsequent conversion of asNodes into a Sequence is just done to finesse the way Query Console displays results.)

For more details and examples, see Querying a Model or Entity Instances.

Query the Model

When you created a model in Create a Model, MarkLogic automatically generated semantic triples from the descriptor. These triples define the model. You can add more facts about the model in the form of additional triples. You can use SPARQL or the Optic API to query a model.

For example, you can use a SPARQL query to discover what entity types are defined by a model, what properties are required in an entity instance of a particular type, or the datatype of a particular entity type property. For more details, see Querying a Model or Entity Instances.

The following procedure uses a SPARQL query to generate a list of all the required properties of an instance of the Person entity type:

  1. Open Query Console in your browser if you do not already have it open.
  2. Add a new query to the workspace by clicking on the + button on the query editor.
  3. Select SPARQL Query in the Query Type dropdown.
  4. Select your content database from the Database dropdown.
  5. Copy and paste the following code into the new query. This code retrieves the names of all required properties of a Person entity instance.
    prefix es:<http://marklogic.com/entity-services#>
    select ?ptitle
    where {
      ?x a es:EntityType;
           es:title "Person";
           es:property ?property .
      ?property a es:RequiredProperty;
                  es:title ?ptitle
    }
  6. Click the Run button. The query results are displayed as a table.

You should see results similar to the following:

ptitle
"lastName"
"fullName"
"firstName"

You can also use the SQL and Optic APIs to query your model and entities as rows if you install an Entity Services generated TDE template based on your model. For more details and examples, see Querying a Model or Entity Instances. To learn more about Semantics in MarkLogic Server, see the Semantics Developer's Guide.

Next Steps

The following topics can help deepen your understanding of the Entity Services API.

  • Explore the end to end Entity Services examples on GitHub. For details, see Exploring the Entity Services Open-Source Examples.
  • Learn more about defining model descriptors; see Creating and Managing Models.

    Model descriptors support several features not covered here, such as identifying a primary key and flagging properties for indexing to facilitate fast searches.

  • Learn about generating additional code and configuration artifacts from your model using the Entity Services API; see Generating Code and Other Artifacts.

    For example, you can use Entity Services to generate Search and Client API query options and database configuration artifacts based on your model. You can also generate a Template Driven Extraction (TDE) template that enables row and semantic search of instances. For details, see Generating a TDE Template.

  • Learn more about querying models and instance data; see Querying a Model or Entity Instances.
  • Explore the open source MarkLogic Data Hub project on GitHub (http://github.com/marklogic/marklogic-data-hub). Version 2.0 and later use the Entity Services API to create a Data Hub application that enables quick and easy entity modeling and creation of entities from source data.

« Previous chapter
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy