Creating a Custom Hook Module
Custom Hook Modules
A custom hook allows you to perform tasks that are outside the scope of Data Hub. Custom hook modules can run immediately before (pre-step hook) or immediately after (post-step hook) the step's core processes.
A step is generally processed as follows:
- The selected data is read from the source database.
- The pre-step hook module runs, if any.
- The main step module runs. This can be a default Data Hub step functionality or a custom step module.
- The post-step hook module runs, if any.
- The processed data is written to the target database.
A custom hook is added to the step in the flow definition file as a JSON node.
If a process in the custom hook module conflicts with a process in the step's core module or functionality, the core module or functionality overrides the hook module's conflicting process.
Each custom hook module is executed in its own environment, separate from Data Hub processes and other modules.
Custom hooks can be added to any type of step (ingestion, mapping, matching, merging, mastering, or custom).
Store all your custom modules under your-project-root/src/main/ml-modules/root/custom-modules. The Gradle task mlDeploy deploys the contents of that directory to the MODULES database.
Required Inputs and Outputs
A custom hook module does not require specific inputs or outputs. However, it can access some information by declaring and using the following variables in the code.
// A custom hook module receives values for the following parameters via Data Hub. You can declare only the ones you need and ignore the rest.
var uris; // An array of one or more URIs being processed.
var content; // An array of objects that represent each document being processed.
var options; // The Options object passed to the step by Data Hub.
var flowName; // The name of the flow being processed.
var stepNumber; // The index of the step within the flow being processed. The stepNumber of the first step is 1.
var step; // The step definition object.
var database; // The target database.
Example
The following custom hook module example archives a record before ingesting a new one with the same URI, so that the new one would have a fresh history. (View in GitHub.)
/**
* This is a simple example of a custom hook that determines if the incoming order is a duplicate of an existing order
* in the STAGING database. If so, the existing order is archived.
* An update transaction in a custom hook has less impact than an update in the main module.
*/
declareUpdate();
// A custom hook module receives values for the following parameters via Data Hub. You can declare only the ones you need and ignore the rest.
var uris; // An array of one or more URIs being processed.
var content; // An array of objects that represent each document being processed.
var options; // The Options object passed to the step by Data Hub.
var flowName; // The name of the flow being processed.
var stepNumber; // The index of the step within the flow being processed. The stepNumber of the first step is 1.
var step; // The step definition object.
var database; // The target database.
// Custom hooks can define zero or more properties in the step definition that declares them.
var archiveCollection;
for (const contentObject of content) {
const order = contentObject.value;
/**
* If a hook is configured with runBefore = true, then the content value will be the "raw" data, not yet wrapped in
* an envelope. If it's configured with runBefore = false, as in this example, then the content value
* will be an envelope.
*/
const instance = order.envelope.instance;
/**
* Note that for better performance, a single query should be done based on all of the objects in the content
* array. This works fine for the small set of data being ingested in this example.
*/
const existingDuplicateOrders = cts.search(cts.andQuery([
cts.collectionQuery("IngestOrders"),
cts.jsonPropertyValueQuery("id", instance.id),
cts.jsonPropertyValueQuery("customer", instance.customer),
cts.jsonPropertyValueQuery("order_date", instance.order_date),
cts.jsonPropertyValueQuery("product_id", instance.product_id)
]));
for (const duplicateOrder of existingDuplicateOrders) {
const duplicateUri = xdmp.nodeUri(duplicateOrder);
// Generate a random URI so that previously archived documents are never overwritten
const archiveUri = "/archive/" + sem.uuidString() + duplicateUri;
xdmp.documentInsert(archiveUri, duplicateOrder, xdmp.documentGetPermissions(duplicateUri), archiveCollection);
xdmp.documentDelete(duplicateUri);
}
}
Next Steps
After creating your custom hook module, add a custom hook to your step in the flow and and specify the path to your new custom hook module.