Data Hub Extensions to the REST Client API

This page provides the list of Data Hub REST Client APIs that extend the MarkLogic REST Client API.

Administration
Flow Management
Record Management
Job Management

Administration

mlHubversion (GET)

Returns the version of the Data Hub installed in your MarkLogic Server instance.

GET /v1/resources/mlHubversion

mlDebug (GET)

Returns true if debugging is currently enabled for Data Hub in the MarkLogic Server instance; otherwise, false.

GET /v1/resources/mlDebug

mlDebug (POST)

POST /v1/resources/mlDebug?rs:enable=[true|false]

rs:enable: (Required) If true, enables debugging for Data Hub in the MarkLogic Server instance; otherwise, debugging is disabled. Default is false.

Flow Management

hubCollector (GET)

          GET /v1/resources/hubCollector?rs:flow-name=YourFlowName&rs:step=1&rs:database=FINAL-database&rs:options={}
        

rs:flow-name: (Required) The name of the flow.
rs:step: (Required) The sequence number of the step that specifies the Source Collection or the Source Query.
rs:database: The name of the database to search. Default is the source database specified in the step.
rs:options: A JSON object containing additional options.

mlRunFlow (POST)

Runs a step within the flow to process the specified records.

          POST /v1/resources/mlRunFlow?rs:job-id=YourJobID&rs:flow-name=YourFlowName&rs:step=1&rs:options={"uris":["comma-separated","uris","of","records","to","process"]}
        

rs:job-id

A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.

rs:flow-name

The name of the flow.

rs:step

The sequence number of the step to execute. To run multiple specific steps, use your orchestration tool to send one mlRunFlow request for each step.

rs:options

A JSON object containing additional options to pass to the flow.

To specify the list of records to process, add the key uris whose value is an array of the URIs of the records to process.

Record Management

MarkLogic Data Hub provides a REST Client API extension which allows you to match and merge/unmerge records programmatically without running a flow.

mlSmMatch (POST)

Compares the specified record with other records and returns the list of possible matches.

          POST /v1/resources/mlSmMatch?rs:uri=URIofFocusRecord&rs:flowName=YourFlowName&rs:step=1&rs:includeMatchDetails=[true|false]&rs:start=1&rs:pageLength=10
        

rs:uri: (Required) The URI of the record to compare with other records.
rs:flowName: (Required) The name of a flow that includes a mastering step.
rs:step: The step number of the mastering step in the specified flow. This task uses the settings in the mastering step. Default is 1, which assumes that the first step in the flow is a mastering step.
rs:includeMatchDetails: If true, additional information about each positive match is provided. Default is false.
rs:start: The index of the first notification to return. Default is 1.
rs:pageLength: The number of notifications to return. Default is 20.

mlSmMerge (POST)

Merges the specified records according to the settings of the specified mastering step.

          POST /v1/resources/mlSmMerge?rs:uri=URI1&rs:uri=URI2&rs:uri=URIn&rs:flowName=YourFlowName&rs:step=1&rs:preview=[true|false]
        

rs:uri: (Required) The URI of one of the records to merge. You must specify at least two URIs.
rs:flowName: (Required) The name of a flow that includes a mastering step.
rs:step: The step number of the mastering step in the specified flow. This task uses the settings in the mastering step. Default is 1, which assumes that the first step in the flow is a mastering step.
rs:preview: If true, no changes are made to the database and a simulated merged record is returned; otherwise, the merged record is saved to the database. Default is false.

mlSmMerge (DELETE)

Reverses the set of merges that created the specified merged record.

          DELETE /v1/resources/mlSmMerge?rs:mergeURI=URIofMergedRecord&rs:retainAuditTrail=[true|false]&rs:blockFutureMerges=[true|false]
        

rs:mergeURI: (Required) The URI of the record to unmerge.
rs:retainAuditTrail: If true, the merged record will be moved to an archive collection; otherwise, it will be deleted. Default is true.
rs:blockFutureMerges: If true, the component records will be blocked from being merged together again. Default is true.

Note: This task archives or deletes the specified merged record and unarchives the component records that were combined to create it. If one of the component records is itself a merged record, the component record will remain so.

mlSmNotifications (GET)

Returns the list of notifications about matches that are close to but did not exceed the merging threshold.

GET /v1/resources/mlSmNotifications?rs:start=1&rs:pageLength=10

rs:start: The index of the first notification to return. Default is 1.
rs:pageLength: The number of notifications to return. Default is 10.

mlSmHistoryDocument (GET)

Returns the document-level history of the specified merged record.

GET /v1/resources/mlSmHistoryDocument?rs:uri=URIofMergedRecord

rs:uri: (Required) The URI of a merged record.

mlSmHistoryProperties (GET)

Returns the history of the specified property or all properties of a merged record.

          GET /v1/resources/mlSmHistoryProperties?rs:uri=URIofMergedRecord&rs:property=YourPropertyName
        

rs:uri: (Required) The URI of a merged record.
rs:property: The name of the specific property. Default is all properties.

Note: Only document-level provenance is tracked by default. To track property-level provenance, you must set "provenanceGranularityLevel" : "fine". See Set Provenance Granularity Manually

Job Management

mlJobs (GET)

Returns job information based on the specified parameters.

          GET /v1/resources/mlJobs?rs:job-id=YourJobID&rs:status=&rs:flowNames=YourFlowName&rs:flow-name=YourFlowName
        

rs:job-id: A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned. Used to return the job document associated with the specified job ID. You can specify either jobID or status, but not both.
rs:status: The status of the job: started, finished, finished_with_errors, running, failed, stop-on-error, or canceled. Used to return the list of job documents associated with all jobs with the specified status. You can specify either jobID or status, but not both.
rs:flowNames: The name of the flow. Used to return the job ID and job information of the latest run that includes the specified flow name. To specify additional flow names, repeat the parameter.
rs:flow-name: The name of the flow. Used to return the list of job documents associated with the all runs that include the specified flow name.

mlBatches (GET)

Returns the batch documents for the specified step or batch within the specified job.

GET /v1/resources/mlBatches?rs:jobid=YourJobID&rs:step=1&rs:batchid=YourBatchID

rs:jobid: (Required) A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
rs:step: (Required) The sequence number of the step whose batch documents to return. You must specify either step or batchId, but not both.
rs:batchid: (Required) The ID of the batch whose documents to return. You must specify either step or batchId, but not both.