MarkLogic Data Hub 5.0 - Release Notes

Data Hub 5.0.4

Bug fixes.

Data Hub 5.0.3

Data Hub 5.0.3 includes the following new features and changes:

"Off" Setting for Provenance

Provenance can now be turned off completely by setting "provenanceGranularityLevel" : "off" in the flow definition file.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.

See Set Provenance Granularity Manually.

Performance Improvements

The performance of the mastering step is significantly improved.

Data Hub 5.0.2

Data Hub 5.0.2 includes the following new features and changes:

Base URI Required for Entities

In previous versions, the base URI property of entities was optional, as it still is in Entity Services.

Now, Data Hub validates the base URI property of entities and makes the property required. It must be in the format http://example.org/.

To add the base URI property to existing entity definitions,

  • you can edit the entity in QuickStart, or
  • you can manually edit the entity definition file in your-project-root/entities to add the baseUri property.
       {
        "info" : {
          "title" : "MyEntity",
          "version" : "0.0.1",
          "baseUri" : "http://example.org/"
        },
        "definitions" : {
          ...
        }
      }
    
Important: If your application uses multiple entity definitions, you must provide a valid base URI in every entity definition used. Otherwise, deploying your application will fail when the entity definitions are loaded into MarkLogic.
Tip: Use the same base URI for related entities.
Compliance with Entity Services Format

Manually created or modified entity definitions must now comply with the required format as defined in the Entity Services documentation, except that Base URI is required by Data Hub.

Entities created or modified inside QuickStart are already compliant.

Support for XQuery Modules in the Gradle Task hubCreateStepDefinition

The Gradle task hubCreateStepDefinition accepts a new option format. If set to xqy (i.e., -Pformat=xqy), Data Hub generates a customizable XQuery sample module and a JavaScript wrapper for it. The XQuery module creates a sample envelope with stub methods for each part of the envelope (e.g., headers, instance, triples).

See hubCreateStepDefinition.

Easier MLCP Ingestion

In Quickstart, the step detail panel for the ingestion step now displays an MLCP command that is prepopulated with the settings you select in the UI.



To use MLCP to ingest your data, simply copy the entire MLCP command displayed in QuickStart and paste it to a command-line window.

Remember: Remember to replace the password value with the correct password before running the command.
Additional Granularity for Provenance Tracking of Mapping, Mastering, and Custom Steps

By default, Data Hub captures document-level provenance information for all steps. Now, you can also track property-level provenance information for mapping, mastering, and custom steps. See Set Provenance Granularity Manually.

If property-level provenance information is tracked, Data Hub adds provenance properties to the created mapped or merged content descriptors to track information about the original document properties and where those document properties are located in the new documents. This results in multiple provenance documents for each mapped or merged document.

You can also define the specific property-level provenance information to track in a custom step. See Provenance in a Custom Step.

Data Hub 5.0.1

Data Hub 5.0.1 includes stability and usability improvements, as well as bug fixes.

Data Hub 5.0.0

Data Hub 5.0.0 includes the following new features and changes:

New Flows-and-Steps Architecture

In MarkLogic Data Hub v5.0, a flow is redefined as a series of steps that process your data.

Code templates are provided for basic use cases and can be customized for more complex situations.

If you are upgrading from 4.x, you can still execute your old 4.x flows using the Gradle task hubRunLegacyFlow. However, you must convert them to 5.0 flows and steps to be able to view or edit them in the QuickStart.

Introducing Smart Mastering in MarkLogic Data Hub

Data Hub now uses the MarkLogic Smart Mastering technology to allow you to merge records that refer to the same entity during a mastering step.

Using QuickStart, you can configure the match-and-merge options of this new feature based on your entity model.

Provenance Information for Steps

Detailed provenance information is automatically tracked for all types of steps (ingestion, mapping, mastering, and custom). Provenance information includes answers to questions, such as:

  • When was this entity instance created?
  • From which step was this entity instance created?
  • From which flow was this entity instance created?
  • Which user created this entity instance?
Change to Deployment of Schemas
The local schemas directory, which is scaffolded to contain schemas that are later deployed to the staging schemas database, is changed:
  • from (old) hub-internal-config/schemas
  • to (new) ml-config/databases/mlStagingSchemasDbName/schemas, where mlStagingSchemasDbName is the value of mlStagingSchemasDbName in gradle.properties, if specified. Default is ml-config/databases/data-hub-STAGING-SCHEMAS/schemas.
If you later change the name of the staging schemas database, you must:
  • rename the directory to ml-config/databases/new-staging-schemas-db-name/schemas and
  • update the value of mlStagingSchemasDbName in gradle.properties.

This change aligns the staging schemas directory path with ml-gradle.

Mapping and Entities

To simplify customization, QuickStart stores mapping information in a JSON file, instead of generated code. The mapping configuration files are in the your-project-root/mappings directory.

Likewise, entity configuration files moved to the your-project-root/entities directory.

Important: If you plan to use Data Hub Service with Data Hub v5.0, you must contact Support to upgrade your DHS environment to use Data Hub v5.0.