Before you begin
You need:
- A REST client
- A MarkLogic Data Hub project
Procedure
- Create a flow.
- Create an ingestion step.
- Run your ingestion step using a REST call.
POST /v1/documents?transform=mlRunIngest&trans:flow-name=YourFlowName&trans:step=1&trans:job-id=YourJobID&trans:options={}
- flow-name
- (Optional) The name of the flow. The default is
default-ingestion
.
- step
- (Optional if
flow-name
is not specified) The sequence number of the ingestion step to execute. If flow-name
is not specified, the default is 1.
- job-id
- (Optional) A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
- options
- (Optional) A JSON object containing additional options.
To add the CSV filename to the header of the envelope, include "inputFileType" : "csv"
in the JSON object.
To override the values of step properties at runtime, include "propertyname" : "valueToUse"
in the JSON object.
The step properties whose values can be overridden at runtime are:
outputFormat
with a value of text
, json
, xml
, or binary
.
provenanceGranularityLevel
with a value of coarse
, fine
, or off
.
disableJobOutput
with a value of true
or false
.
See Data Hub Extensions to the REST Client API.
What to do next
Create another flow using Gradle. Then add mapping and mastering (combined or split) steps to enhance the ingested data.