MarkLogic Data Hub 5.5 - Release Notes

Data Hub 5.5.5

Bug fixes.

Note: To learn more about the details of this release, see GitHub Release Notes for Data Hub 5.5.5.

Data Hub 5.5.4

Bug fixes.

Note: To learn more about the details of this release, see GitHub Release Notes for Data Hub 5.5.4.

Data Hub 5.5.3

Data Hub 5.5.3 includes the following new features and changes:

Hub Central

New Behavior Change to Configure Matching

The following changes were made to the configure match thresholds scale:

  • Click the Enable Threshold Scale toggle switch to position, edit, or delete thresholds.

The following changes were made to the configure match rulesets scale:

  • Click the Enable Ruleset Scale toggle switch to position, edit, or delete rulesets.
  • The weight is displayed as a negative integer if Reduce Weight is enabled for a ruleset.

See Configure Matching Using Hub Central.

Bug fixes.

Note: To learn more about the details of this release, see GitHub Release Notes for Data Hub 5.5.3.

Data Hub 5.5.2

Bug fixes.

Note: To learn more about the details of this release, see GitHub Release Notes for Data Hub 5.5.2.

Data Hub 5.5.1

Bug fixes.

Note: To learn more about the details of this release, see GitHub Release Notes for Data Hub 5.5.1.

Data Hub 5.5.0

Data Hub 5.5.0 includes the following new features and changes:

Important: Upgrading to this release would trigger a reindexing of the STAGING and FINAL databases. Learn more about how reindexing works and its impact on performance..

QuickStart is now deprecated. Hub Central supports all the same functionality and should be used instead. As of 5.5, match and merge steps can only be configured and run in Hub Central.

Custom hooks are now deprecated, and interceptors should be used instead.

General Enhancements

A new option enables running a step interceptor before the primary function is invoked (to do this, specify "when" as "beforeMain"). See Step interceptors.

Support is now included for testing Data Hub steps with marklogic-unit-test and JUnit 5.

You can ingest and run a Data Hub flow of steps in a single call via a new REST extension or MLCP. When you specify more than one step, the output of one step is the input to the next step. This is a more performant way to ingest and run multiple steps since it involves only one call to MarkLogic.

You now have the ability to facet on structured types in Hub Central's Explore feature.

When Data Hub writes documents via a step, documents are now written to the user’s default collections.

You can now set ‘quality’ to documents created by a step to control the relevance score of documents in text searches.

Mapping Enhancements

You can now map multiple entities from a single source document within the same step. See Mapping Enhancements.

A new Attach Source Document field in mapping settings lets you specify whether the source document should be copied into the mapped entity instance. A new URI field in every mapping configuration lets you define a URI template as a mapping expression. A new mapping function called hubURI generates a UUID and prefixes the name of the specified entity type. The function signature is hubURI(entityType).

You can define custom parameters that are referenced from a mapping expression.

Mastering Enhancements
  • Configure match and merge steps in HC for structured properties.
  • Test your configuration for matching. View total match scores and broken down match contributions for each property. View documents side-by-side to compare similarities and differences. See Mastering Enhancements.
Monitoring Enhancements
  • Monitor the steps and flows that have been run in the Data Hub via Hub Central (Facet, Filter, Sort).
  • New jobs REST extension with additional parameters. See Data Hub Extensions to the REST Client API.

  • Provenance capture has been turned off by default for new steps created.
New Step Options

Among the new step options in 5.5, writeStepOutput defaults to true. If set to false, the content objects outputted by a step are not be persisted. Typically, this is only useful when running multiple steps on ingest. See Steps.

A step has a "type" that is defined by the step definition with which it is associated. The following properties apply to a step regardless of its type. The new step properties added are listed below:

Property Value Required Description
enableBatchOutput string No If "never", then a batch document will never be created when the step is run. If "onFailure", then a batch document is created only if an error occurs for a batch when the step is run. Otherwise, a batch document is created for the step, unless "disableJobOutput" is set to "true" as a flow option or a runtime option.
targetCollectionsAdditivity boolean No Defaults to false. If set to true, then for any content object returned by a step that was also as input content object, its original collections will be retained.
writeStepOutput boolean No Defaults to true. If set to false, then the content objects outputted by a step will not be persisted. Typically only useful when running multiple steps on ingest.

Release Notes for Earlier Versions