Customizing Steps

Various ways to customize how your data is processed.

You can customize a step in multiple ways.

Replace a predefined process.

To replace a predefined process (ingestion/loading, mapping, or mastering), create a custom step and choose the appropriate custom step type.

Using Hub Central:

Using Gradle:

Retrieve source data with script.

The source query can be:

  • a CTS query that retrieves a set of URIs, or
  • a custom script that retrieves a set of items of any type, including URIs.
Important: MarkLogic recommends writing source queries that return documents based on collection tags or information in the documents.

Avoid writing source queries that return documents based on URIs. When you load/ingest delimited text files into MarkLogic Server, URIs are randomly generated each time you load/ingest. To generate the same URIs each time you load/ingest delimited text files, use MLCP.

Non-URI items must be handled by a custom step module.

To define the source data passed to a custom step module, manually set the source query to a script in the flow configuration file.

  1. In the flow configuration file, manually modify the custom step section:
       ...
      "sourceQuery" : "string-containing-javascript-code",
      "sourceQueryIsScript" : true,
      ...
    
    • Set sourceQuery to a string containing any script that can be passed to xdmp.eval.
    • Set sourceQueryIsScript to true.

Example: To retrieve values from an index for the step to process:

   "sourceQuery" : "cts.values([cts.elementReference('someIndexedProperty')], null, null, cts.collectionQuery('some-collection'))",

Other options as parameters

You can refer to the value of any option using cts.elementReference(options.name-of-option). Example:

   "sourceQuery" : "cts.valueTuples([cts.elementReference(options.myOption)])",

The option can be:

  • included in the options section of the same step, or
  • specified at runtime; e.g., as a parameter to the Gradle task hubRunFlow.

Items as arrays

If you use a function that returns arrays, such as cts.valueTuples or cts.elementValueCoOccurrences, in your sourceQuery script, the arrays are serialized into strings before being grouped into batches. Therefore, in your custom step module, you must convert each item back to an array before processing.

You can use xdmp.eval to perform the conversion. Example:

   const array = fn.head(xdmp.eval(content.uri));
Note: The item passed to the step module is always stored in content.uri, whether the item is a URI or a serialized array.

Add an interceptor or a custom hook to a step.

A custom hook is processed as a separate server transaction, and changes can be saved by the custom hook module to the database within its own transaction.

In contrast, an interceptor runs in the same transaction as the core step processes; i.e., immediately after the core step processes and before the data is saved.


Flow of step with interceptor and pre-step and post-step custom hooks

Learn more about interceptors and custom hooks.

To add an interceptor to a step:

  • In Hub Central,
    1. Edit the Advanced Settings of the step.
    2. Enter a JSON object in the Interceptor field with information about the interceptor module.
         [
          {
            "path": "/uri/of/custom/module/in/modules/database/a.sjs",
            "vars": {
              "myParameter": "myParameterValue"
            }
            "when": "beforeContentPersisted",
          }
        ]
      

To add a custom hook to a step:

  • In Hub Central,
    1. Edit the Advanced Settings of the step.
    2. Enter a JSON object in the Custom Hook field with information about the custom hook module.
         {
          "module" : "/custom-modules/your-step-type/your-hook-directory/your-hook-module-name.sjs",
          "parameters" : {},
          "user" : "data-hub-operator",
          "runBefore" : false
        }
      
  • Add a Custom Hook to a Step Manually

Create custom mapping functions.

To customize the mapping process, you can create custom mapping functions to use in your mapping definition, in addition to the predefined Data Hub mapping functions.