Business analysts often describe processes in terms of logical business entities, such as Customers and Orders, and the relationships between them. MarkLogic Entity Services is a set of tools and interfaces that make it easier to create applications that manipulate these business entities, even when your raw data has a different structure.
You can use Entity Services to model your business entities and generate code and configuration artifacts that facilitate creating, querying, and exporting entity instances.
This section contains the following topics:
The material in this guide assumes the reader is familiar with the following terms and definitions:
Term | Definition |
---|---|
model descriptor | A definition of a set of entity types, their properties, and relationships. You use a descriptor to create a model and model-based application code and configuration artifacts. For more details, see Creating and Managing Models. |
model | A model includes entity type definitions, entity property definitions, relationships between entity types, and facts about the model (as semantic triples). A model descriptor contributes the entity type and entity property definitions, and relationships between entities. MarkLogic generates a default set of facts from the descriptor, and you can add additional facts to the model. For details, see Creating and Managing Models. |
entity | An abstraction of a logical business object that can be stored and manipulated by applications. For example, a sales model might include entities such as a customer, order, or inventory item. |
entity type | A definition of the characteristics of an entity instance, including its properties and relationships to other entities. |
entity instance | A concrete instantiation of an entity type, as represented by a populated data structure representing an individual entity, or a document containing such a data structure. |
entity property | A concrete characteristic of an entity type. For example, a customer entity type might have properties such as a name, address, and customer id. Entity properties whose type is an entity type express an entity relationship. |
entity relationship | A logical relationship between entity types. For example, an order entity type might include relationships with a customer and inventory item entities. In Entity Services, an entity relationship is expressed as an entity property whose type is an entity type (rather than scalar or array type). For details, see Defining Entity Relationships. |
envelope document | By Entity Services convention, a document that encapsulates an entity instance, metadata, and, optionally, the raw source from which the entity was generated. For details, see Managing Entity Instances. |
local reference | In a model descriptor, a reference to an entity type that can be fully resolved within that descriptor. For example, if a model defines Race and Runner entity types, and a Race entity type has a property that is an array of references to Runners, then those references are local references. For details, see Defining Entity Relationships. |
external reference | In a model descriptor, a reference to an entity type that is not defined within the same descriptor. For details, see Defining Entity Relationships. |
TDE template | A Template Driven Extraction (TDE) template. Use Entity Services to generate a template that enables querying your entity instance data as rows or semantic triples. For details, see Generating a TDE Template and Search Basics for Instance Data. |
harmonization | The process of transforming data from disparate sources into a common, model-based representation. |
data hub | An application that takes in raw data from disparate sources and transforms the data into canonical business entities that can be used by applications without regard to differences in the original source. |
Enterprise applications must often work with data from multiple sources. The data shares common conceptual objects, such as customer or order, but representation details can differ significantly. The meaning of the data is spread across schemas, application code, ETL code, and the minds of developers, DBAs, and data stewards.
Working directly with this heterogeneous data imposes cognitive load on developers and adds complexity to applications. A model-based view of your data eliminates these problems because it surfaces a consistent view of the real world objects and relationships in your data, independent of the raw representation.
A model defines logical entity types, their properties, and the relationships between entities. For example, say your model includes logical customer and order entities. A customer entity includes a name property. An order entity includes an order number property. There are relationships between customer and order entities: A customer is associated with each order, and a customer has a list of a orders.
You might capture this information in a modeling diagram such as the following:
Entity modeling fits well with MarkLogic. You can ingest your heterogeneous raw data and immediately get value out of it, using MarkLogic's application development, search, and indexing features. These same features enable you to explore your data for purposes of data discovery. As you explore your data, you uncover entities and relationships that can be modeled.
Using the Entity Services API, you can capture your modeled entity types, properties, and relationships in a model descriptor, and then use the descriptor to create a model. Given a model, you can use Entity Services to generate a variety of artifacts on which to build your model-based application. The diagram below outlines this process. For more details, see Entity Services Overview.
You can build up a model iteratively. You do not need to finalize your model to begin getting value from the model or your data. The model can grow and change as your data does, without negatively impacting downstream data consumers: Model based code can easily accommodate a new data source or a new data discovery, such as the need to expose a new entity type.
Modeling also enables you to expose different views of your data. For example, if you are modeling patient data, you might have one model that exposes a billing view of the data and another model that exposes a quality of care view of the data. Both models can sit on top of the same raw data set and need not be defined simultaneously.
Entity Services is an API and a set of conventions you can use to quickly stand up an application based on entity modeling.
The Entity Services API provides the following services to facilitate application development based on entity modeling:
Entity Services promotes a convention for implementing model-based applications, but it does not force this convention on you. For example, you can use the API to generate code for encapsulating entity instances, metadata, and raw source in an envelope document with a recommended structure. However, you are free to modify or replace this structure.
Entity Services supports a modeling vocabulary in the form of a model descriptor. The descriptor syntax is based on Swagger and JSON schema. A model descriptor not only identifies entity types, their properties, and relationships, but also captures information such as data types and metadata.
For example, recall the entity diagram from Why Use Entity Modeling?:
This diagram captures entity types and relationships, but does not include data type and other details required by a developer. Entity Services uses a model descriptor to capture detailed entity type definition and metadata in one place. This enables data stewards and developers to share a common view of the model.
The model descriptor is the basis for creating a model, generating code templates, and generating schemas and configuration artifacts. An Entity Services model descriptor can be expressed in either XML or JSON.
A JSON descriptor for the above diagram might look like the following. Metadata about the model is captured in the info section, while the entity types, their properties, and relationships are captured in the definitions section.
{ "info": { "title": 'OrderTracker', "version": '1.0.0', "baseUri": 'http://acme.com/sales/', "description": 'A model of customer order tracking' }, "definitions": { "Customer": { "properties": { "name": { "datatype": 'string' }, "orders": { "datatype": "array", "items": { "ref": "#/definitions/Order"} } } }, "Order": { "properties": { "orderId": { "datatype": "string" }, "customer": { "ref": "#/definitions/Customer"} } }, } }
You can express additional requirements, such as which properties are required and which properties should be indexed for efficient search.
For more details, see Creating and Managing Models.
When you follow the Entity Services paradigm, you persist two kinds of modeling related artifacts in the database: The model and entity instance envelope documents.
When you persist a model descriptor in MarkLogic as a document in the special Entity Services collection, MarkLogic generates a model from the descriptor. This model is a graph of semantic triples representing facts about the model. The initial set of facts are those that can be derived from the model descriptor. You can then extend the model to include your own facts, in the form of additional triples. For more details, see Creating and Managing Models.
The following diagram depicts the key parts of an entity model:
By convention, an entity instance is persisted in MarkLogic as part of an envelope document that encapsulates the instance, instance metadata, and the raw source data from which the instance is derived. You manage envelope documents like any other document in MarkLogic. You can use Entity Services to generate some configuration and other artifacts that facilitate searching instance data stored in recommended envelope layout. For more details, see Managing Entity Instances.
Once you create a model, you can use it with Entity Services to generate code, schemas, and configuration artifacts to help you create a model-based application. The generated code and artifacts are designed to be customized and extended to meet the needs of your application. Entity Services does not enforce any particular data layout or code pattern.
You can generate the following code modules using Entity Services. The input in all cases is a model descriptor. You are expected to customize the generated code to meet the needs of your application.
You can generate the following additional artifacts using Entity Services. The input in all cases is a model descriptor. You can extend or customize any or all of these artifacts, if needed, but they all deliver value to your application as-is.
ml-gradle
that can be used to create indexes and lexicons based on your entity type definitions. You can easily extract the configuration to use with the REST Management API rather than ml-gradle
.For more details, see Generating Code and Other Artifacts.
Use the following suggestions to continue learning about Entity Services:
The Entity Services library is automatically installed when you install MarkLogic Server. The library is no longer being maintained as an open source project on GitHub. The GitHub project does contain several examples, which you recommend you download and review.
The examples in this guide are simple ones based on data from the GitHub examples, but they are independent of the GitHub examples. You might still wish to explore the GitHub examples because they illustrate end-to-end integration of Entity Services with other MarkLogic tools and interfaces.
The example directory of the project can be found at the following URL:
http://github.com/marklogic/entity-services/tree/master/entity-services-examples
Before you can deploy and run the examples, you must create a local copy of the project. You can do this using the git
tool (or other git client), or by downloading a zip file from GitHub. For details, see one of the following topics:
Detailed instructions for deploying and running these examples are on GitHub.
To obtain a local copy from a ZIP file, follow these steps:
entity-services-
branch. For example, you will have a directory named entity-services-master
if you downloaded the master branch.entity-services-
branch/entity-services-examples
.http://github.com/marklogic/entity-services/blob/master/entity-services-examples/README.md
No special security privileges or roles are needed to use the Entity Services API.
The entity envelope documents, code modules, schemas, and other artifacts you generate when using the Entity Services API are generic and can be secured using the same mechanisms as other documents and modules. For example, you should use document permissions to manage access to your envelope documents and persisted model descriptor.
Special privileges might be required to deploy some of the generated artifacts. For example, the user who installs generated code modules must have permission to insert into modules database. Similarly, the user who installs a TDE template created using Entity Services requires the tde-admin
role or equivalent privileges, as when installing any other template.