Ingest Using Data Hub REST Extensions

Data Hub provides extensions to the REST Client API that you can use in any REST client.

Before you begin

You need:

  • A REST client
  • A MarkLogic Data Hub project

Procedure

  1. Create a flow.
  2. Create an ingestion step.
  3. Run your ingestion step using a REST call.
    POST /v1/documents?transform=mlRunIngest&trans:flow-name=YourFlowName&trans:step=1&trans:job-id=YourJobID&trans:options={}
    flow-name
    (Optional) The name of the flow. The default is default-ingestion.
    step
    (Optional if flow-name is not specified) The sequence number of the ingestion step to execute. If flow-name is not specified, the default is 1.
    job-id
    (Optional) A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
    options
    (Optional) A JSON object containing additional options.

    To add the CSV filename to the header of the envelope, include "inputFileType" : "csv" in the JSON object.

    To override the values of step properties at runtime, include "propertyname" : "valueToUse" in the JSON object.

    The step properties whose values can be overridden at runtime are:

    • outputFormat with a value of text, json, xml, or binary.
    • provenanceGranularityLevel with a value of coarse, fine, or off.
    • disableJobOutput with a value of true or false.

    See Data Hub Extensions to the REST Client API.

What to do next

Create another flow using Gradle. Then add mapping and mastering (combined or split) steps to enhance the ingested data.