Customizing Steps
Various ways to customize how your data is processed.
You can customize a step in multiple ways.
Replace a predefined process.
To replace a predefined process (ingestion/loading, mapping, or mastering), create a custom step and choose the appropriate custom step type.
Using Hub Central:
Using Gradle:
Retrieve source data with script.
The source query can be:
- a CTS query that retrieves a set of URIs, or
- a custom script that retrieves a set of items of any type, including URIs.
Avoid writing source queries that return documents based on URIs. When you load/ingest delimited text files into MarkLogic Server, URIs are randomly generated each time you load/ingest. To generate the same URIs each time you load/ingest delimited text files, use MLCP.
Non-URI items must be handled by a custom step module.
To define the source data passed to a custom step module, manually set the source query to a script in the flow configuration file.
- In the flow configuration file, manually modify the custom step section:
... "sourceQuery" : "string-containing-javascript-code", "sourceQueryIsScript" : true, ...
- Set
sourceQuery
to a string containing any script that can be passed toxdmp.eval
. - Set
sourceQueryIsScript
totrue
.
- Set
Example: To retrieve values from an index for the step to process:
"sourceQuery" : "cts.values([cts.elementReference('someIndexedProperty')], null, null, cts.collectionQuery('some-collection'))",
Other options as parameters
You can refer to the value of any option using cts.elementReference(options.name-of-option)
. Example:
"sourceQuery" : "cts.valueTuples([cts.elementReference(options.myOption)])",
The option can be:
- included in the
options
section of the same step, or - specified at runtime; e.g., as a parameter to the Gradle task
hubRunFlow
.
Items as arrays
If you use a function that returns arrays, such as cts.valueTuples
or cts.elementValueCoOccurrences
, in your sourceQuery
script, the arrays are serialized into strings before being grouped into batches. Therefore, in your custom step module, you must convert each item back to an array before processing.
You can use xdmp.eval
to perform the conversion. Example:
const array = fn.head(xdmp.eval(content.uri));
content.uri
, whether the item is a URI or a serialized array.Add an interceptor or a custom hook to a step.
A custom hook is processed as a separate server transaction, and changes can be saved by the custom hook module to the database within its own transaction.
In contrast, an interceptor runs in the same transaction as the core step processes; i.e., immediately after the core step processes and before the data is saved.
Learn more about interceptors and custom hooks.
To add an interceptor to a step:
- In Hub Central,
- Edit the Advanced Settings of the step.
- Enter a JSON object in the Interceptor field with information about the interceptor module.
[ { "path": "/uri/of/custom/module/in/modules/database/a.sjs", "vars": { "myParameter": "myParameterValue" } "when": "beforeContentPersisted", } ]
To add a custom hook to a step:
- In Hub Central,
- Edit the Advanced Settings of the step.
- Enter a JSON object in the Custom Hook field with information about the custom hook module.
{ "module" : "/custom-modules/your-step-type/your-hook-directory/your-hook-module-name.sjs", "parameters" : {}, "user" : "data-hub-operator", "runBefore" : false }
- Add a Custom Hook to a Step Manually
Create custom mapping functions.
To customize the mapping process, you can create custom mapping functions to use in your mapping definition, in addition to the predefined Data Hub mapping functions.