MarkLogic Data Hub 5.1 - Release Notes

Data Hub 5.1.0

Data Hub 5.1.0 includes the following new features and changes:

Mapping

Mapping with XPath Expressions

In the mapping step, entity properties can be assigned values that are derived from XPath expressions which can include predefined or custom functions.

This feature is supported only with MarkLogic Server 9.0-11 or 10.0-2 up to the latest 10.x release.

See About Mapping, Data Hub Mapping Functions, Create Custom Mapping Functions.

Mapping Nested Entities in QuickStart

QuickStart now allows mapping to nested entities.

This feature is supported only with MarkLogic Server 9.0-11 or 10.0-2 up to the latest 10.x release.

See Complex Entities, Configure a Mapping Step Using QuickStart.

Validation of Mapped Entity Instance

In the mapping step, you can validate the resulting mapped entity instance against the schema document based on the entity model.

See Validation of Mapped Entity Instance.

Mastering

Split Mastering: Matching Step and Merging Step

You can run your mastering process in two separate steps to improve performance: the matching step and the merging step. The split reduces the likelihood that a record is locked when another process needs access to it.

The classic combined mastering (matching and merging in a single step) can still be used for small datasets, but thread count must be set to 1 to avoid locking issues. In most cases, split mastering with multiple threads is ideal.

See About Mastering - Combined-Step versus Split-Step Mastering.

Manual Merging and Unmerging
The following Gradle tasks allow manual merging and unmerging of records:
  • hubMergeEntities
  • hubUnmergeEntities

See Gradle Tasks - Record Management Tasks.

Additional Provenance Information for Mastering

Additional provenance information is stored for mastering to increase transparency in the process.

See Provenance for Mastering.

New REST APIs for Mastering
The following REST APIs related to mastering are added to the Data Hub:
  • mlSmMatch (POST)
  • mlSmMerge (POST)
  • mlSmMerge (DELETE)
  • mlSmNotifications (GET)
  • mlSmHistoryDocument (GET)
  • mlSmHistoryProperties (GET)

See Data Hub Extensions to the REST Client API - Record Management.

Custom Steps

Expanded Custom Step Types in QuickStart

In QuickStart, you can specify the Custom Step Type of a custom step to be Ingestion, Mapping, and Mastering, and Other to provide more detailed step templates according to the default step that the custom step would replace. This functionality has been available programmatically in previous releases and is now also available in QuickStart.

See Create Step Using QuickStart.

New Custom Step Templates

New templates are generated by the Gradle task hubCreateStepDefinition for each of the step types (ingestion, mapping, mastering, and custom). Each template includes extensive comments to help you customize it for your needs. You can find the templates in the folder your-project-root/step-definitions.

Other Changes

API prefix ml: to ml

The API prefix ml: is now prepended to the API name as ml (without the colon). Example: ml:runFlow is now mlRunFlow.

See Data Hub Extensions to the REST Client API.

Release Notes for Earlier Versions