Mapping

Overview of mapping in Data Hub.

About Model-to-Model Mapping

A data model defines how data is structured. Each field in a dataset corresponds to a property in the data model, even if the data model is not explicitly defined.

Your source datasets might have data models that are different from each other. For example, one dataset might have a field called family-name and another might call the same field surname.

Model-to-model mapping is the process of associating the fields (properties) in your source dataset (data model) with properties in a standardized data model to make it easier to access the data regardless of its source.

For example, you can create the property lastname in your standardized data model. Any request for the value of lastname would return the correct value whether the source field is called family-name or surname.

Mapping in MarkLogic Data Hub

In Data Hub, the standardized model is an entity model, which is a canonical representation of your entity or business object.

To define a mapping, you need:

  • An entity model which you create.
  • One or more source datasets. If you create your mapping in QuickStart, you must also ingest at least one record from your source dataset.
Note: If you have multiple source datasets with different data models, you must create one mapping for each source data model against the same entity model.

You can define a mapping in two ways:

  • The easiest way to define a mapping is through QuickStart when creating and configuring a mapping step. QuickStart chooses an arbitrary ingested record from the STAGING database to determine the source fields that can be mapped against the entity model properties. You can select another ingested record to use to generate the list of source fields you can choose from.
  • You can also define a mapping manually by creating a mapping definition file for each source dataset.

Mapping Functions

A basic mapping simply associates one entity property with one field in the source record. However, you can also calculate values to assign to the entity property by specifying a mapping expression.

A mapping expression is a valid XPath expression that can include functions and can use values from one or more source fields. For example, you can add the values of two source fields and save the sum in a single entity property.

You can also create your own custom functions.

Note: An entity model can have nested properties.
Tip: You can use $URI in your mapping expression to refer to the URI of the document being processed.

Validation of Mapped Entity Instance

To specify whether to validate the mapped entity instance against the schema document based on the entity model and how to handle the discrepancy, you can set the validateEntity option in the mapping step.

   "2" : {
    "name" : "MyMappingStep",
    ...,
    "options" : {
      ...,
      "validateEntity" : "false",
      ...
    }
  },
Field Description
"validateEntity" : "false" The data types are not compared.
"validateEntity" : "accept" Compares the data types. If the comparison fails, the resulting mapped record is created anyway.

If the comparison fails, validation errors are logged in the header section of the envelope (envelope.headers.datahub.validationErrors) of the new mapped record. Where the mapping step failed is also logged.

"validateEntity" : "reject" Compares the data types. If the comparison fails, the resulting mapped record is not created.

If the comparison fails, validation errors are logged in the batch document in the JOBS database.

This setting is included in a mapping step generated by the Gradle tasks hubCreateFlow and hubCreateStepDefinition. If your mapping step was created in QuickStart, you can manually add the option in the flow configuration file.