Before you begin
You need:
- A REST client
- A MarkLogic Data Hub project
Procedure
- Create a flow.
- Create an ingestion step.
- Run your ingestion step using a REST call.
POST /v1/documents?transform=mlRunIngest&trans:flow-name=YourFlowName&trans:step=1&trans:job-id=YourJobID&trans:options={}
- flow-name
- (Optional) The name of the flow. The default is
default-ingestion.
- step
- (Optional if
flow-name is not specified) The sequence number of the ingestion step to execute. If flow-name is not specified, the default is 1.
- job-id
- (Optional) A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
- options
- (Optional) A JSON object containing additional options.
To add the CSV filename to the header of the envelope, include "inputFileType" : "csv" in the JSON object.
To override the values of step properties at runtime, include "propertyname" : "valueToUse" in the JSON object.
The step properties whose values can be overridden at runtime are:
outputFormat with a value of text, json, xml, or binary.
provenanceGranularityLevel with a value of coarse, fine, or off.
disableJobOutput with a value of true or false.
See Data Hub Extensions to the REST Client API.
What to do next
Create another flow using Gradle. Then add mapping and mastering (combined or split) steps to enhance the ingested data.