MarkLogic Data Hub 5.0 - Release Notes
Data Hub 5.0.4
Bug fixes.
Data Hub 5.0.3
Data Hub 5.0.3 includes the following new features and changes:
"Off" Setting for Provenance
Provenance can now be turned off completely by setting "provenanceGranularityLevel" : "off"
in the flow definition file.
Performance Improvements
The performance of the mastering step is significantly improved.
Data Hub 5.0.2
Data Hub 5.0.2 includes the following new features and changes:
Base URI Required for Entities
In previous versions, the base URI property of entities was optional, as it still is in Entity Services.
Now, Data Hub validates the base URI property of entities and makes the property required. It must be in the format http://example.org/.
To add the base URI property to existing entity definitions,
- you can edit the entity in QuickStart, or
- you can manually edit the entity definition file in your-project-root/entities to add the
baseUri
property.{ "info" : { "title" : "MyEntity", "version" : "0.0.1", "baseUri" : "http://example.org/" }, "definitions" : { ... } }
Compliance with Entity Services Format
Manually created or modified entity definitions must now comply with the required format as defined in the Entity Services documentation, except that Base URI is required by Data Hub.
Entities created or modified inside QuickStart are already compliant.
Support for XQuery Modules in the Gradle Task hubCreateStepDefinition
The Gradle task hubCreateStepDefinition accepts a new option format
. If set to xqy
(i.e., -Pformat=xqy
), Data Hub generates a customizable XQuery sample module and a JavaScript wrapper for it. The XQuery module creates a sample envelope with stub methods for each part of the envelope (e.g., headers, instance, triples).
Easier MLCP Ingestion
In Quickstart, the step detail panel for the ingestion step now displays an MLCP command that is prepopulated with the settings you select in the UI.
To use MLCP to ingest your data, simply copy the entire MLCP command displayed in QuickStart and paste it to a command-line window.
Additional Granularity for Provenance Tracking of Mapping, Mastering, and Custom Steps
By default, Data Hub captures document-level provenance information for all steps. Now, you can also track property-level provenance information for mapping, mastering, and custom steps. See Set Provenance Granularity Manually.
If property-level provenance information is tracked, Data Hub adds provenance properties to the created mapped or merged content descriptors to track information about the original document properties and where those document properties are located in the new documents. This results in multiple provenance documents for each mapped or merged document.
You can also define the specific property-level provenance information to track in a custom step. See Provenance in a Custom Step.
Data Hub 5.0.1
Data Hub 5.0.1 includes stability and usability improvements, as well as bug fixes.
Data Hub 5.0.0
Data Hub 5.0.0 includes the following new features and changes:
New Flows-and-Steps Architecture
In MarkLogic Data Hub v5.0, a flow is redefined as a series of steps that process your data.
Code templates are provided for basic use cases and can be customized for more complex situations.
If you are upgrading from 4.x, you can still execute your old 4.x flows using the Gradle task hubRunLegacyFlow. However, you must convert them to 5.0 flows and steps to be able to view or edit them in the QuickStart.
Introducing Smart Mastering in MarkLogic Data Hub
Data Hub now uses the MarkLogic Smart Mastering technology to allow you to merge records that refer to the same entity during a mastering step.
Using QuickStart, you can configure the match-and-merge options of this new feature based on your entity model.
Provenance Information for Steps
Detailed provenance information is automatically tracked for all types of steps (ingestion, mapping, mastering, and custom). Provenance information includes answers to questions, such as:
- When was this entity instance created?
- From which step was this entity instance created?
- From which flow was this entity instance created?
- Which user created this entity instance?
Change to Deployment of Schemas
- from (old) hub-internal-config/schemas
- to (new) ml-config/databases/mlStagingSchemasDbName/schemas, where mlStagingSchemasDbName is the value of mlStagingSchemasDbName in gradle.properties, if specified. Default is ml-config/databases/data-hub-STAGING-SCHEMAS/schemas.
- rename the directory to ml-config/databases/new-staging-schemas-db-name/schemas and
- update the value of mlStagingSchemasDbName in gradle.properties.
This change aligns the staging schemas directory path with ml-gradle.
Mapping and Entities
To simplify customization, QuickStart stores mapping information in a JSON file, instead of generated code. The mapping configuration files are in the your-project-root/mappings directory.
Likewise, entity configuration files moved to the your-project-root/entities directory.