Create a Merging Step Using Hub Central
Before you begin
You need:
- Security role: Hub Central Developer or any role that inherits it. Learn more: Users and Roles
Procedure
- Go to the Curate area of Hub Central.
- In the list, expand the entity type you want to use.
- Click the Merge tab, then click the Add New tile.
- Configure the step's basic settings.
Name Description Name The name of the step instance. Note: The step name is used as part of the names of related assets, as a collection name to tag processed documents, and as metadata in provenance and lineage logs. Therefore, it cannot be changed after the step is created.Description (Optional) A description of the step. Source Query The collection or CTS query that selects the source data to process. Learn more: CTS Query.
Timestamp Path The path to a timestamp field within the record. This field is used to determine which values to include in the merged property, based on their recency, up to the maximum number specified in the Max Values field in Merge Options (Standard) or in Merge Strategies. Namespaces used in the path must be defined within the record. - Go to the Advanced tab.
- Configure the step's advanced settings.
Name Description Source Database The database from which to take the input data. Choose the same source database that you selected in the matching step. The default is data-hub-FINAL
.Target Database The same database you selected in Source Database. The default is data-hub-FINAL
.Important: For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.Target Collections Comma-separated string containing additional collection tags to apply to the processed records. The Default Collections tags are automatically applied. Target Permissions The permissions required to access the documents created by the step. The string must be in the format
role,capability,role,capability,...
, wherecapability
can beread
,insert
,update
, orexecute
.Provenance Granularity The granularity of the provenance tracking information: coarse
(default) to store document-level provenance information only,fine
to store document-level and property-level provenance information, oroff
to disable provenance tracking in future job runs. Applies only to mapping, matching, merging, mastering, and custom steps. Learn more: About Provenance and Lineage.Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. The recommended batch size for merging is 1. Interceptors An array of JSON objects specifying the custom modules that perform additional processes on a batch after the core step processes are completed and before the results are saved in the database. Syntax:
[ { "path": "/uri/of/custom/module/in/modules/database/a.sjs", "vars": { "myParameter": "myParameterValue" } "when": "beforeContentPersisted", } ]
- path
- The URI of the interceptor in the MODULES database.
- vars
- (Optional) A JSON object containing parameters to pass to the interceptor.
- when
- Currently, only
beforeContentPersisted
is supported.
Custom Hook A step add-on that performs additional processes in its own transaction before or after the core step transaction. Results are saved within a transaction. Syntax:
{ "module" : "/uri/of/custom/module/in/modules/database/a.sjs", "parameters" : { "myParameter" : "myParameterValue" }, "user" : "account-required-to-run-module", "runBefore" : false }
- module
- The URI of your custom hook module in the MODULES database.
- parameters
- (Optional) A JSON object containing parameters to pass to your custom hook module.
- user
- The user account to use to run the module. Typically, a user with the security role data-hub-operator.
- runBefore
- For a pre-step hook, set to
true
. For a post-step hook, set tofalse
.
Learn more: Creating a Custom Hook Module and Adding a Custom Hook to a Step Manually.