Loading TOC...
Entity Services Developer's Guide (PDF)

MarkLogic 9 Product Documentation
Entity Services Developer's Guide
— Chapter 1

Introduction to Entity Services

Business analysts often describe processes in terms of logical business entities, such as Customers and Orders, and the relationships between them. MarkLogic Entity Services is a set of tools and interfaces that make it easier to create applications that manipulate these business entities, even when your raw data has a different structure.

You can use Entity Services to model your business entities and generate code and configuration artifacts that facilitate creating, querying, and exporting entity instances.

This section contains the following topics:

Terms and Definitions

The material in this guide assumes the reader is familiar with the following terms and definitions:

Term Definition
model descriptor A definition of a set of entity types, their properties, and relationships. You use a descriptor to create a model and model-based application code and configuration artifacts. For more details, see Creating and Managing Models.
model A model includes entity type definitions, entity property definitions, relationships between entity types, and facts about the model (as semantic triples). A model descriptor contributes the entity type and entity property definitions, and relationships between entitites. MarkLogic generates a default set of facts from the descriptor, and you can add additional facts to the model. For details, see Creating and Managing Models.
entity An abstraction of a logical business object that can be stored and manipulated by applications. For example, a sales model might include entities such as a customer, order, or inventory item.
entity type A definition of the characteristics of an entity instance, including its properties and relationships to other entities.
entity instance A concrete instantiation of an entity type, as represented by a populated data structure representing an individual entity, or a document containing such a data structure.
entity property A concrete characteristic of an entity type. For example, a customer entity type might have properties such as a name, address, and customer id. Entity properties whose type is an entity type express an entity relationship.
entity relationship A logical relationship between entity types. For example, an order entity type might include relationships with a customer and inventory item entities. In Entity Services, an entity relationship is expressed as an entity property whose type is an entity type (rather than scalar or array type). For details, see Defining Entity Relationships.
envelope document By Entity Services convention, a document that encapsulates an entity instance, metadata, and, optionally, the raw source from which the entity was generated. For details, see Managing Entity Instances.
local reference In a model descriptor, a reference to an entity type that can be fully resolved within that descriptor. For example, if a model defines Race and Runner entity types, and a Race entity type has a property that is an array of references to Runners, then those references are local references. For details, see Defining Entity Relationships.
external reference In a model descriptor, a reference to an entity type that is not defined within the same descriptor. For details, see Defining Entity Relationships.
TDE template A Template Driven Extraction (TDE) template. Use Entity Services to generate a template that enables querying your entity instance data as rows or semantic triples. For details, see Generating a TDE Template and Search Basics for Instance Data.
harmonization The process of transforming data from disparate sources into a common, model-based representation.
data hub An application that takes in raw data from disparate sources and transforms the data into canonical business entities that can be used by applications without regard to differences in the original source.

Why Use Entity Modeling?

Enterprise applications must often work with data from multiple sources. The data shares common conceptual objects, such as customer or order, but representation details can differ significantly. The meaning of the data is spread across schemas, application code, ETL code, and the minds of developers, DBAs, and data stewards.

Working directly with this heterogeneous data imposes cognitive load on developers and adds complexity to applications. A model-based view of your data eliminates these problems because it surfaces a consistent view of the real world objects and relationships in your data, independent of the raw representation.

A model defines logical entity types, their properties, and the relationships between entities. For example, say your model includes logical customer and order entities. A customer entity includes a name property. An order entity includes an order number property. There are relationships between customer and order entities: A customer is associated with each order, and a customer has a list of a orders.

You might capture this information in a modeling diagram such as the following:

Entity modeling fits well with MarkLogic. You can ingest your heterogeneous raw data and immediately get value out of it, using MarkLogic's application development, search, and indexing features. These same features enable you to explore your data for purposes of data discovery. As you explore your data, you uncover entities and relationships that can be modeled.

Using the Entity Services API, you can capture your modeled entity types, properties, and relationships in a model descriptor, and then use the descriptor to create a model. Given a model, you can use Entity Services to generate a variety of artifacts on which to build your model-based application. The diagram below outlines this process. For more details, see Entity Services Overview.

You can build up a model iteratively. You do not need to finalize your model to begin getting value from the model or your data. The model can grow and change as your data does, without negatively impacting downstream data consumers: Model based code can easily accomodate a new data source or a new data discovery, such as the need to expose a new entity type.

Modeling also enables you to expose different views of your data. For example, if you are modeling patient data, you might have one model that exposes a billing view of the data and another model that exposes a quality of care view of the data. Both models can sit on top of the same raw data set and need not be defined simultaneously.

Entity Services Overview

Entity Services is an API and a set of conventions you can use to quickly stand up an application based on entity modeling.

The Entity Services API provides the following services to facilitate application development based on entity modeling:

  • Modeling Vocabulary: The modeling vocabulary supported by Entity Services provides a structured way to describe entities, their properties, and relationships between entities.
  • Persistence Convention: The entity persistence pattern promoted by Entity Services defines a convention for representing harmonized entities, metadata, and raw data as documents. Your applications can centralize on a single pattern for storing and manipulating entities.
  • Application Scaffolding: You can use Entity Services to generate code and configuration artifacts from an entity model. This provides a well-defined framework on which to base an application.

Entity Services promotes a convention for implementing model-based applications, but it does not force this convention on you. For example, you can use the API to generate code for encapsulating entity instances, metadata, and raw source in an envelope document with a recommended structure. However, you are free to modify or replace this structure.

Modeling Vocabulary

Entity Services supports a modeling vocabulary in the form of a model descriptor. The descriptor syntax is based on Swagger and JSON schema. A model descriptor not only identifies entity types, their properties, and relationships, but also captures information such as data types and metadata.

For example, recall the entity diagram from Why Use Entity Modeling?:

This diagram captures entity types and relationships, but does not include data type and other details required by a developer. Entity Services uses a model descriptor to capture detailed entity type definition and metadata in one place. This enables data stewards and developers to share a common view of the model.

The model descriptor is the basis for creating a model, generating code templates, and generating schemas and configuration artifacts. An Entity Services model descriptor can be expressed in either XML or JSON.

A JSON descriptor for the above diagram might look like the following. Metadata about the model is captured in the info section, while the entity types, their properties, and relationships are captured in the definitions section.

{ "info": {
    "title": 'OrderTracker',
    "version": '1.0.0',
    "baseUri": 'http://acme.com/sales/',
    "description": 'A model of customer order tracking'
  },
  "definitions": {
    "Customer": {
      "properties": {
        "name": { "datatype": 'string' },
        "orders": {
          "datatype": "array",
          "items": { "ref": "#/definitions/Order"}
    } } },
    "Order": {
      "properties": {
        "orderId": { "datatype": "string" },
        "customer": { "ref": "#/definitions/Customer"}
    } },
} }

You can express additional requirements, such as which properties are required and which properties should be indexed for efficient search.

For more details, see Creating and Managing Models.

Persistence Convention

When you follow the Entity Services paradigm, you persist two kinds of modeling related artifacts in the database: The model and entity instance envelope documents.

When you persist a model descriptor in MarkLogic as a document in the special Entity Services collection, MarkLogic generates a model from the descriptor. This model is a graph of semantic triples representing facts about the model. The initial set of facts are those that can be derived from the model descriptor. You can then extend the model to include your own facts, in the form of additional triples. For more details, see Creating and Managing Models.

The following diagram depicts the key parts of an entity model:

By convention, an entity instance is persisted in MarkLogic as part of an envelope document that encapsulates the instance, instance metadata, and the raw source data from which the instance is derived. You manage envelope documents like any other document in MarkLogic. You can use Entity Services to generate some configuration and other artifacts that facilitate searching instance data stored in recommended envelope layout. For more details, see Managing Entity Instances.

Application Scaffolding

Once you create a model, you can use it with Entity Services to generate code, schemas, and configuration artifacts to help you create a model-based application. The generated code and artifacts are designed to be customized and extended to meet the needs of your application. Entity Services does not enforce any particular data layout or code pattern.

You can generate the following code modules using Entity Services. The input in all cases is a model descriptor. You are expected to customize the generated code to meet the needs of your application.

  • Instance Converter Module: A code template for converting raw source data into entity instances and encapsulating the instances into entity envelope documents. The code will run as-is, but you will need to customize the code to meet the needs of your application.
  • Version Translator Module: A code template for converting between different versions of a model. For example, if you add a new entity type or a new entity property, you can use a converter module to easily upgrade your entity instances to the new model.

You can generate the following additional artifacts using Entity Services. The input in all cases is a model descriptor. You can extend or customize any or all of these artifacts, if needed, but they all deliver value to your application as-is.

  • Model Schema: An XML schema derived from the model. Useful for validating entity instances. For example, when harmonizing source data with your model, you can use schema validation to ensure your envelope documents contain correct entity instances.
  • Template Driven Extraction Template: A TDE template that can be used to generate views of your instance data as rows or triples. If you deploy the template, you can use interfaces such as SQL, SPARQL, and the Optic API to query your instances.
  • Query Options: A set of query options usable with the Search API and the REST, Java, and Node.js Client APIs. For example the options define a constraint for each required property of an entity type and limit search results to returning just the canonical instance data from an envelope document.
  • Database Configuration: A database configuration file compatible with ml-gradle that can be used to create indexes and lexicons based on your entity type definitions. You can easily extract the configuration to use with the REST Management API rather than ml-gradle.

For more details, see Generating Code and Other Artifacts.

Next Steps

Use the following suggestions to continue learning about Entity Services:

Exploring the Entity Services Open-Source Examples

The Entity Services library is automatically installed when you install MarkLogic Server. The library isno longer being maintained as an open source project on GitHub. The GitHub project does contain several examples, which you recommend you download and review.

The examples in this guide are simple ones based on data from the GitHub examples, but they are independent of the GitHub examples. You might still wish to explore the GitHub examples because they illustrate end-to-end integration of Entity Services with other MarkLogic tools and interfaces.

The example directory of the project can be found at the following URL:

http://github.com/marklogic/entity-services/tree/master/entity-services-examples

Before you can deploy and run the examples, you must create a local copy of the project. You can do this using the git tool (or other git client), or by downloading a zip file from GitHub. For details, see one of the following topics:

Detailed instructions for deploying and running these examples are on GitHub.

Downloading the Project as a ZIP File

To obtain a local copy from a ZIP file, follow these steps:

  1. Navigate to the following URL in your browser: http://github.com/marklogic/entity-services. The entity-services project home page on GitHub is displayed.
  2. Click the Clone or download dropdown. A dialog box appears.
  3. Click Download ZIP. When prompted, choose a location in which to save the ZIP file and click Save.
  4. Unzip the download file to a folder of your choice. By default, this creates a folder named entity-services-branch. For example, you will have a directory named entity-services-master if you downloaded the master branch.
  5. Change directory into entity-services-branch/entity-services-examples.
  6. Follow the instructions on this page to configure, deploy, and run the examples:
    http://github.com/marklogic/entity-services/blob/master/entity-services-examples/README.md

Security Considerations

No special security privileges or roles are needed to use the Entity Services API.

The entity envelope documents, code modules, schemas, and other artifacts you generate when using the Entity Services API are generic and can be secured using the same mechanisms as other documents and modules. For example, you should use document permissions to manage access to your envelope documents and persisted model descriptor.

Special privileges might be required to deploy some of the generated artifacts. For example, the user who installs generated code modules must have permission to insert into modules database. Similarly, the user who installs a TDE template created using Entity Services requires the tde-admin role or equivalent privileges, as when installing any other template.

« Table of contents
Next chapter »