Entities

Overview

Entities are the high-level business objects in your enterprise. For example, employee, product, purchase order, or department.

In MarkLogic Data Hub, you can use MarkLogic's Entity Services to create models of your business entities and to generate code scaffolding, database configurations, index settings, and validations based on those models. Entity Services handles the model definition and entity instance documents through API calls. If you use your own abstract entities, you must provide this framework.

Note: MarkLogic strongly recommends that you use Entity Services unless you have specific needs that it cannot address.

An entity model is comprised of entity types.

  • An entity type is a data type to which each record of your data will be standardized, so that data from all sources can be viewed and used uniformly. An entity type is comprised of entity properties.
  • An entity property is a data field in a record of your data.
  • An entity instance is a copy of the entity type structure, which is added to your data record during mapping and then populated with values from the raw data.

All the data about an entity is consolidated in a single record, which contains the standardized entity instance, the original raw data, the provenance and lineage, and other metadata. The provenance and lineage information includes the changes that occurred from the raw data to the most recent entity instance.

Note: An entity model is not required to ingest your raw data. However, it is required to configure your mapping.

See:

Entity Model Validation

Only valid entity models are loaded into the MarkLogic Server instance. A valid entity model meets the following requirements:

  • info/baseUri in the model must have a value that is a valid URI.
  • The model must pass the Entity Services validation function es:model-validate.

If the entity model is valid and loaded, Data Hub creates the following in the STAGING and FINAL databases:

  • schema files:
    • /entities/YourEntityName.entity.json. The validated entity model returned by es:model-validate.
    • /entities/YourEntityName.entity.schema.json. The entity model as a JSON schema for use in XDMP validation (xdmp.jsonValidate).
    • /entities/YourEntityName.entity.xsd. The entity model as a XSD schema for use in XDMP validation (xdmp.validate).
  • a TDE template (/tde/YourEntityName-YourEntityVersion.tdex) with the following permissions:
    • read permissions for the data-hub-operator
    • update permissions for the data-hub-developer
    • default permissions allowed to the user who loaded the entity model to MarkLogic Server
    Important: In MarkLogic 10 and later versions, a user assigned to the admin role only must be explicitly given read access to view the TDE template.

    If a TDE template with the same URI already exists but is not in the ml-data-hub-tde collection, a new TDE template will not be created.

Guidance on Mapping Multiple Entities from a Single Source

The topics below illustrate how to create different types of entity relationships using the Hub Central one-to-many modeling and mapping tools.

Creating a 1:1 Relationship

You can create a 1:1 relationship between entities when an entity instance is related to a single other entity instance.

For example, the Customer entity is related to the BabyRegistry entity, and each customer maintains a single baby registry. The relationship is 1:1.

Here are the modeled entities before defining the 1:1 relationship:

Customer
customerId  integer      
firstName   string
lastName    string
 
BabyRegistry
babyRegId   integer      
arrivalDate date

To model the 1:1 relationship, define a property in one of the participating entities that points to the related entity. The other entity in the relationship is untouched.

Here, the BabyRegistry property is defined in the Customer entity:

Customer
customerId integer       
firstName  string
lastName   string
created    BabyRegistry

Alternatively, a Customer property can be defined in the BabyRegistry entity:

BabyRegistry
babyRegId   integer      
arrivalDate date
createdBy   Customer

Which entity the one chooses to hold the property may be based on the lifecycle or complexity of the entities involved.

When defining the mapping, point to the related entity's primary key to establish the relationship. Here is how the mapping is defined when the reference property is in the Customer entity:

Customer
customerId  /CustomerID
firstName   /Name/FirstName
lastName    /Name/LastName
created   /BabyRegistry/BabyRegistryId

Here is how it is defined when the reference property is in the BabyRegistry entity:

BabyRegistry
babyRegId   /BabyRegistry/BabyRegistryId
arrivalDate /BabyRegistry/Arrival_Date
createdBy   /CustomerID

Creating a N:1 Relationship

You can create a N:1 relationship between entities when one or more entity instances are related to one instance of another entity.

In our example, the Order entity is related to the Customer entity, and each customer can create multiple orders. Each order is associated with a single customer. The relationship is N:1.

Here are the modeled entities before defining the 1:N relationship:

Customer
customerId  integer      
firstName   string
lastName    string
 
Order                                          
orderId       integer    
timestamp     dateTime

To model the 1:N relationship, create a property on the many (N) side of the relationship that points to the other entity. In our example, the Order is on the many side, so we create a property in Order that points to Customer:

Order                                          
orderId       integer     
timestamp     dateTime
orderedBy     Customer    

When defining the mapping, the user points the property on the many side to the other entity's primary key. In our example, the Order property will point to the Customer primary key:

Order
orderId          /Orders/OrderId     
timestamp        /Orders/DateAndTime
orderedBy        /CustomerID
 

Creating an N:M Relationship

You can create an N:M relationship between entities when an entity instance is related to one or more instances of another entity and vice-versa.

In our example, the Order and Product entities are related. An order can have one or more products and a product can be part of one or more orders. The relationship is N:M.

Here are the modeled entities before defining the relationship:

Order                                          
orderId       integer    
timestamp     dateTime
 
Product                                        
productId     integer
name          string

To model the N:M relationship, you define a property in one of the participating entities that points to the related entity. Similar to the 1:1 case above, the relationship can be modeled by defining a property in either of the participating entities. (The other entity in the relationship is left untouched.)

Here, a Product property is defined in the Order entity:

Order                                          
orderId       integer    
timestamp     dateTime
includes      Product    yes

The cardinality of the property is also marked as multiple.

Alternatively, an Order property can be defined in the Product entity:

Product                                        
productId     integer
name          string
isPartOf      Order      yes

Which entity the user chooses to hold the property may be based on the lifecycle or complexity of the entities involved.

When defining the mapping, point to the related entity's primary key to establish the relationship. Here is how the mapping is defined when the reference property is in the Order entity:

Order
orderId          /Orders/OrderId     
timestamp        /Orders/DateAndTime
includes         /Orders/Products/ProductId

Here is how it is defined when the reference property is in the Product entity:

Product
productId        /Orders/Products/ProductId
name             /Orders/Products/Name
isPartOf         /Orders/OrderId

Because the relationship is N:M, the properties in the resulting entity instances will have arrays as their datatypes, and the arrays will hold one or more values based on the number of related entities.