Customizing Steps

Various ways to customize how your data is processed.

You can customize a step in multiple ways.

Replace a predefined process.

To replace a predefined process (ingestion/loading, mapping, or mastering), create a custom step and choose the appropriate custom step type.

Using Hub Central:

Using Gradle:

Retrieve source data with script.

The source query can be:

  • a CTS query that retrieves a set of URIs, or
  • a custom script that retrieves a set of items of any type, including URIs.
Important: MarkLogic recommends writing source queries that return documents based on collection tags or information in the documents.

Avoid writing source queries that return documents based on URIs. When you load/ingest delimited text files into MarkLogic Server, URIs are randomly generated each time you load/ingest. To generate the same URIs each time you load/ingest delimited text files, use MLCP.

Non-URI items must be handled by a custom step module.

To define the source data passed to a custom step module, manually set the source query to a script in the flow configuration file.

  1. In the flow configuration file, manually modify the custom step section:
       ...
      "sourceQuery" : "string-containing-javascript-code",
      "sourceQueryIsScript" : true,
      ...
    
    • Set sourceQuery to a string containing any script that can be passed to xdmp.eval.
    • Set sourceQueryIsScript to true.

Example: To retrieve values from an index for the step to process:

   "sourceQuery" : "cts.values([cts.elementReference('someIndexedProperty')], null, null, cts.collectionQuery('some-collection'))",

To retrieve source documents in Data Hub 6.0, customize the step by manually setting the sourceModule configuration:

	"selectedSource" : "sourceModule",
        "sourceQueryIsModule" : true,
        "sourceModule" : {
            "modulePath" : "/uri/of/the/collector/module",
            "functionName" : "collectorFunctionNameTobeInvoked"
            }
		
Important: MarkLogic recommends returning URIs (Uniform Resource Identifier) from the collector.

Other options as parameters

You can refer to the value of any option using cts.elementReference(options.name-of-option). Example:

   "sourceQuery" : "cts.valueTuples([cts.elementReference(options.myOption)])",

The option can be:

  • included in the options section of the same step, or
  • specified at runtime.
Note: The Gradle command hubRunFlow does not apply in Data Hub 6.1. Please utilize the Data Hub Client JAR to run flows.

Items as arrays

If you use a function that returns arrays, such as cts.valueTuples or cts.elementValueCoOccurrences, in your sourceQuery script, the arrays are serialized into strings before being grouped into batches. Therefore, in your custom step module, you must convert each item back to an array before processing.

You can use xdmp.eval to perform the conversion. Example:

   const array = fn.head(xdmp.eval(content.uri));
Note: The item passed to the step module is always stored in content.uri, whether the item is a URI or a serialized array.

Add an interceptor to a step.

Note: Custom hooks are deprecated. Use interceptors instead.

An interceptor runs in the same transaction as the core step processes; i.e., immediately after the core step processes and before the data is saved.

Learn more about interceptors.

To add an interceptor to a step:

  • In Hub Central,
    1. Edit the Advanced Settings of the step.
    2. Enter a JSON object in the Interceptor field with information about the interceptor module.
         [
          {
            "path": "/uri/of/custom/module/in/modules/database/a.sjs",
            "vars": {
              "myParameter": "myParameterValue"
            }
            "when": "beforeContentPersisted",
          }
        ]
      

Create custom mapping functions.

To customize the mapping process, you can create custom mapping functions to use in your mapping definition, in addition to the predefined Data Hub mapping functions.