MarkLogic Data Hub - Release Notes

Data Hub 5.0 includes the following new features and changes:

New Flows-and-Steps Architecture

In MarkLogic Data Hub v5.0, a flow is redefined as a series of steps that process your data.

Code templates are provided for basic use cases and can be customized for more complex situations.

If you are upgrading from 4.x, you can still execute your old 4.x flows using the Gradle task hubhubRunLegacyFlow. However, you must convert them to 5.0 flows and steps to be able to view or edit them in the QuickStart.

Introducing Smart Mastering in MarkLogic Data Hub

Data Hub now uses the MarkLogic Smart Mastering technology to allow you to merge records that refer to the same entity during a mastering step.

Using QuickStart, you can configure the match-and-merge options of this new feature based on your entity model.

Provenance Information for Steps

Detailed provenance information is automatically tracked for all types of steps (ingestion, mapping, mastering, and custom). Provenance information includes answers to questions, such as:

  • When was this entity instance created?
  • From which step was this entity instance created?
  • From which flow was this entity instance created?
  • Which user created this entity instance?
Change to Deployment of Schemas
The local schemas directory, which is scaffolded to contain schemas that are later deployed to the staging schemas database, is changed:
  • from (old) hub-internal-config/schemas
  • to (new) ml-config/databases/mlStagingSchemasDbName/schemas, where mlStagingSchemasDbName is the value of mlStagingSchemasDbName in gradle.properties, if specified. Default is ml-config/databases/data-hub-STAGING-SCHEMAS/schemas.
If you later change the name of the staging schemas database, you must:
  • rename the directory to ml-config/databases/new-staging-schemas-db-name/schemas and
  • update the value of mlStagingSchemasDbName in gradle.properties.

This change aligns the staging schemas directory path with ml-gradle.

Mapping and Entities

To simplify customization, QuickStart stores mapping information in a JSON file, instead of generated code. The mapping configuration files are in the your-project-root/mappings directory.

Likewise, entity configuration files moved to the your-project-root/entities directory.

Important: If you plan to use Data Hub Service with Data Hub v5.0, you must contact Support to upgrade your DHS environment to use Data Hub v5.0.