Merging

Overview of merging in Data Hub.

About Merging in MarkLogic Data Hub

Merging handles the candidates accordingly, based on thresholds. Merging rules define how two or more records in the data would be merged together. Merging is non-destructive. A new record is created with the combined contents of the original records, according to the merging rules you create. The original records stay in the database and are tagged as archived.

  • Merge options define how the properties of the candidate records are combined.
  • Merge strategies are sets of merge options that you can name and reuse.
  • Merge collections are sets of records that have the same collection tags.
Note: You can do your own merge using a custom module.

The merging configuration is stored as properties at the root of the merging step document. See Merging Step Configuration Structure.

Rules

In a merge, a new record is created and the values from the original records could be combined and copied to the new record, according to the rules you specify. For example,

  • You can restrict the number of unique values copied to the new record.
  • You can restrict the number of data sources from which to copy values.
  • You can specify that only records from specific datasets are allowed to be merged. And you can assign a weight to each source, so you can give priority to more reliable sources.
  • You can also assign a weight to the length of a string.

Merge rules define how an entity instance properties or document nodes are selected for the composite entity instance in the new merged document. There is the standard merge type and the custom merge type.

The default sorting order for properties is the last updated DateTime in descending order.

There are two ways to target values in a record for merging in a merging rule:

  • entityPropertyPath is a dot notation that indicates the location of a property in a record by chaining together the property titles with a period.
  • documentXPath is XPath that targets the location of a node in the record.

Strategies

If you use certain combinations of merge settings, you can save them as a strategy and refer to them by the strategy name.