Editing a Custom Step Module
Custom Step Modules
A custom step allows you to add your own custom functionality anywhere in the flow sequence.
To perform specialized tasks in Data Hub, you can create a custom step that calls a custom step module. The custom step module created by QuickStart when configuring a custom step or by the Gradle task hubCreateStepDefinition contains instructions to help you develop your module.
Required Inputs
- Content object. An object that contains everything that you want to process. Could be an array of
Content
objects, one per document or record. EachContent
object consists of the following:content.uri
. The URI of the document or record to process.content.context
. All the metadata associated with the document or record (found in the database, but not included in the envelope). Examples: permissions, collection tags, temporal settings.content.value
. The information to process in the custom module.
- Options object. Custom objects, such as parameters (as JSON key-value parameters) that are passed to the step.
Required Outputs
Step Type | Required Outputs |
---|---|
Custom-Ingestion | A Content object. |
|
A Content object or an array of Content objects. |
Content
object contains the processed data to be written to the database and consists of the following:
content.uri
. The URI of the document or record to overwrite or create in the database. If the document does not already exist, the URI must be unique; otherwise the old data with the same URI will be overwritten and the changes will be logged in the provenance data.content.context
. All the metadata to associate with the document or record.content.value
. The information to store in the database.content.provenance
. (Optional) Additional property-level provenance information to store, if the provenance granularity is set tofine
.
Provenance in a Custom Step
You can choose to track property-level provenance information, in addition to the default document-level provenance information. See Set Provenance Granularity Manually.
In a custom step, you can also specify which property-level provenance information is tracked. To do so,
- The
Content
objects returned by your custom module must have acontent.provenance
component. content.provenance
must contain the properties that you want to track and their values.- The value of
content.provenance
must be in the following format. Data Hub will convert it to the PROV-XML schema before storing it in the JOBS database.{ "<originalURI>": { "<originalXPathOrPropertyName>": { "destination": "<XPathOrPropertyInNewDocument>", "value": "<newValue>" } } }
Example 1: If you mapped the lastName
property to the surName
property, you can set content.provenance
to the following:
{
"/26451baa-fe14-471f-bd77-364ac3f64c82.json": {
"lastName": {
"destination": "surName",
"value": "Smith"
}
}
}
Example 2: If your custom module pulled information from multiple documents into the current one, you can combine the provenance information of the source documents into a single content.provenance
.
{
"/26451baa-fe14-471f-bd77-364ac3f64c82.json": {
"lastName": {
"destination": "surName",
"value": "Smith"
}
},
"/5455fd37-6d96-4883-9349-8e79fa700145.json": {
"firstName": {
"destination": "givenName",
"value": "John"
}
}
}
content.provenance
is not in the Content
objects returned by your custom module and granularity is set to fine
for the step, only the default document-level provenance information will be tracked (same as coarse
). No error is thrown.Best Practices
- Although you can code your custom module in XQuery, MarkLogic recommends using JavaScript.
- Use the
DataHub
object, which gives you access to the Data Hub libraries. For example, theDataHub
object can generate an envelope around an XML or a JSON document. - You can handle errors in two ways:
- If you are using an orchestration application (e.g., NiFi),
- You can throw an error inside the module. Every thrown error is reported back to the orchestrator, where it is logged with the URI of the document that failed.
- In another step (which can also be in another flow), you can search the orchestrator log for documents with a specific error and fix them accordingly.
- Instead of throwing an error,
- You can add a special collection tag to the document that failed.
- In another step, you can search for documents with that collection tag and fix them accordingly.
- If you are using an orchestration application (e.g., NiFi),