Data Hub Extensions to the REST Client API

This page provides the list of Data Hub REST Client APIs that extend the MarkLogic REST Client API.

Flow Management

mlRunFlow (POST)

The REST Client API extension that runs a single step within the flow to process the specified records.

POST /v1/resources/mlRunFlow?rs:job-id=YourJobID&rs:flow-name=YourFlowName&rs:step=1&database=YourSourceDatabase&rs:target-database=YourTargetDatabase&rs:options={"uris":["comma-separated","uris","of","records","to","process"]}
rs:job-id
A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
rs:flow-name
The name of the flow.
rs:step
The sequence number of the step to execute. To run multiple specific steps, use your orchestration tool to send one runFlow request for each step.
database
The database containing the source data; e.g., data-hub-STAGING.
rs:target-database
The database where you want to store the processed data; e.g., data-hub-FINAL.
rs:options
A JSON object containing additional options to pass to the flow.
  • To specify the list of records to process, add the key uris whose value is an array of the URIs of the records to process.

Record Management

MarkLogic Data Hub provides a REST Client API extension which allows you to match and merge/unmerge records programmatically without running a flow.

mlSmMatch (POST)

Compares the specified record with other records and returns the list of possible matches.

POST /v1/resources/mlSmMatch?rs:uri=URIofFocusRecord&rs:flowName=YourFlowName&rs:step=1&rs:includeMatchDetails=[true|false]
rs:uri
(Required) The URI of the record to compare with other records.
rs:flowName
(Required) The name of a flow that includes a mastering step.
rs:step
The step number of the mastering step in the specified flow. This task uses the settings in the mastering step. Default is 1, which assumes that the first step in the flow is a mastering step.
rs:includeMatchDetails
If true, additional information about each positive match is provided. Default is false.
mlSmMerge (POST)

Merges the specified records according to the settings of the specified mastering step.

POST /v1/resources/mlSmMerge?rs:uri=URI1&rs:uri=URI2&rs:uri=URIn&rs:flowName=YourFlowName&rs:step=1&rs:preview=[true|false]
rs:uri
(Required) The URI of one of the records to merge. You must specify at least two URIs.
rs:flowName
(Required) The name of a flow that includes a mastering step.
rs:step
The step number of the mastering step in the specified flow. This task uses the settings in the mastering step. Default is 1, which assumes that the first step in the flow is a mastering step.
rs:preview
If true, no changes are made to the database and a simulated merged record is returned; otherwise, the merged record is saved to the database. Default is false.
mlSmMerge (DELETE)

Reverses the set of merges that created the specified merged record.

DELETE /v1/resources/mlSmMerge?rs:mergeURI=URIofMergedRecord&rs:retainAuditTrail=[true|false]&rs:blockFutureMerges=[true|false]
rs:mergeURI
(Required) The URI of the record to unmerge.
rs:retainAuditTrail
If true, the merged record will be moved to an archive collection; otherwise, it will be deleted. Default is true.
rs:blockFutureMerges
If true, the component records will be blocked from being merged together again. Default is true.
Note: This task archives or deletes the specified merged record and unarchives the component records that were combined to create it. If one of the component records is itself a merged record, the component record will remain so.
mlSmNotifications (GET)

Returns the list of notifications about matches that are close to but did not exceed the merging threshold.

GET /v1/resources/mlSmNotifications?rs:start=1&rs:pageLength=10
rs:start
The index of the first notification to return.
rs:pageLength
The number of notifications to return.
mlSmNotifications (POST)

Returns specific values from the list of notifications about matches that are close to but did not exceed the merging threshold.

POST /v1/resources/mlSmNotifications?rs:start=1&rs:pageLength=10
rs:start
The index of the first notification to return.
rs:pageLength
The number of notifications to return.
The body of the request must contain a JSON object that specifies the notification values to return. Format:
   { "key": "fieldname" }
Returns the specified values in the format:
   { ...
    extractions: { "/path-to-uri.xml": { "key": "value-of-field" } }
  }
Example: If the notification /uri1.xml contains:
   <Person>
    <PersonFirstName>Bob</PersonFirstName>
    <PersonLastName>Smith</PersonLastName>
  </Person>
And the body of the POST request contains:
   { "firstName": "PersonFirstName" }
The results include an extractions node as follows:
   { ...
    extractions: {
      "/uri1.xml": { "firstName": "Bob" }
    }
  }
mlSmNotifications (PUT)

Returns the list of notifications about matches that are close to but did not exceed the merging threshold.

PUT /v1/resources/mlSmNotifications?rs:uris=array-of-URIs-of-notifications-to-update&rs:status=read
rs:uris
An array of strings containing the URIs of notifications to update.
rs:status
The new status of the notifications. Valid values are read and unread.
mlSmNotifications (DELETE)

Returns the list of notifications about matches that are close to but did not exceed the merging threshold.

DELETE /v1/resources/mlSmNotifications?rs:uri=uri-of-notification-to-delete
rs:uri
The URI of the notification to delete.
mlSmHistoryDocument (GET)

Returns the document-level history of the specified merged record.

GET /v1/resources/mlSmHistoryDocument?rs:uri=URIofMergedRecord
rs:uri
(Required) The URI of a merged record.
mlSmHistoryProperties (GET)

Returns the history of the specified property or all properties of a merged record.

GET /v1/resources/mlSmHistoryProperties?rs:uri=URIofMergedRecord&rs:property=YourPropertyName
rs:uri
(Required) The URI of a merged record.
rs:property
The name of the specific property. Default is all properties.
Note: Only document-level provenance is tracked by default. To track property-level provenance, you must set "provenanceGranularityLevel" : "fine". See Set Provenance Granularity Manually