Creating a Custom Step Module

Custom Step Modules

A custom step allows you to add your own custom functionality anywhere in the flow sequence.

To perform specialized tasks in Data Hub, you can create a custom step that calls a custom step module.

Required Inputs

  • Content object. An object that contains everything that you want to process. Could be an array of Content objects, one per document or record. Each Content object consists of the following:
    • content.uri. The URI of the document or record to process.
    • content.context. All the metadata associated with the document or record (found in the database, but not included in the envelope). Examples: permissions, collection tags, temporal settings.
    • content.value. The information to process in the custom module.
  • Options object. Custom objects, such as parameters (as JSON key-value parameters) that are passed to the step.

Required Outputs

  • One or more Content objects containing the processed data to be written to the database. Includes URI, context, and value. Each Content object consists of the following:
    • content.uri. The URI of the document or record to overwrite or create in the database. If the document does not already exist, the URI must be unique; otherwise the old data with the same URI will be overwritten and the changes will be logged in the provenance data.
    • content.context. All the metadata to associate with the document or record.
    • content.value. The information to store in the database.

Best Practices

  • Although you can code your custom module in XQuery, MarkLogic recommends using JavaScript.
  • Use the DataHub object, which gives you access to the Data Hub libraries. For example, the DataHub object can generate an envelope around an XML or a JSON document.
  • You can throw an error inside the module and handle it in another step.

Example

The following custom module example enriches the data by adding longitude and latitude information based on the zip code.


      const DataHub = require("/data-hub/5/datahub.sjs");
      const datahub = new DataHub();
      const zipcodeData = require("/custom-modules/utils/zipcodeData.sjs"); // Load module for mapping latitude and longitude from zip codes

      function main(content, options) {

        // Get the document's ID/URI.
        let id = content.uri;

        // Get and manipulate the context metadata associated with the document.
        let context = content.context;

        // Set the format of the output.
        let outputFormat = options.outputFormat ? options.outputFormat.toLowerCase() : datahub.flow.consts.DEFAULT_FORMAT;

        // Verify that the output is in either XML or JSON format, not in binary or other text format.
        if (outputFormat !== datahub.flow.consts.JSON && outputFormat !== datahub.flow.consts.XML) {
          datahub.debug.log({
            message: 'The output format of type ' + outputFormat + ' is invalid. Valid options are ' + datahub.flow.consts.XML + ' or ' + datahub.flow.consts.JSON + '.',
            type: 'error'
          });
          throw Error('The output format of type ' + outputFormat + ' is invalid. Valid options are ' + datahub.flow.consts.XML + ' or ' + datahub.flow.consts.JSON + '.');
        }

        /*
        This scaffolding assumes that the document is retrieved from the source database.

        If you are adding information to (instead of modifying values in) an existing record in the database,
        you can create an instance (object), headers (object), and triples (array) using data from content.value,
        instead of calling the flowUtils functions to extract information from the document retrieved from MarkLogic Server.
        In addition, you do not have to verify that the document exists.

        Example code for using data that is sent to MarkLogic Server for the document
        let instance = content.value;
        let triples = [];
        let headers = {};
        */

        // Verify that the record is still in the database before processing it. Not required when creating new records.
        // 'fn' is a MarkLogic Server function.
        if (!fn.docAvailable(id)) {
          datahub.debug.log({message: 'The document with the uri: ' + id + ' could not be found.', type: 'error'});
          throw Error('The document with the uri: ' + id + ' could not be found.')
        }

        // Set the 'doc' variable to the value of 'content.value'.
        let doc = content.value;

        // If the document is an instance of Document or XMLDocument (i.e., not a type of Node, such as ObjectNode or XMLNode).
        if (doc && (doc instanceof Document || doc instanceof XMLDocument)) {
          doc = fn.head(doc.root);
        }

        // Get the instance. If the instance is not found, getInstance() returns an empty object.
        // If the document is inside an envelope, the default format is envelope/instance.
        // If you have a custom format, get the instance using: doc.xpath('/xpath/here')
        let instance = datahub.flow.flowUtils.getInstance(doc).toObject() || {};

        // Get the triples. Return null, if empty or not found.
        let triples = datahub.flow.flowUtils.getTriples(doc) || [];

        // Get the headers. Return null, if not found.
        let headers = datahub.flow.flowUtils.getHeaders(doc) || {};

        // To set attachments, use this instead: instance['$attachments'] = doc;
        instance['$attachments'] = {
          envelope: {
            headers: datahub.flow.flowUtils.getHeaders(doc) || {},
            triples: datahub.flow.flowUtils.getTriples(doc) || [],
            instance: datahub.flow.flowUtils.getInstance(doc)
          }
        };

        // Insert code to manipulate the instance, triples, headers, uri, context metadata, etc.
        if (instance["Postal"]) {
          let zipcode =  instance["Postal"].substring(0, 5);
          instance["latitude"] = zipcodeData.getLatitude(zipcode);
          instance["longitude"] = zipcodeData.getLongitude(zipcode);
        }

        // Create the envelope using the specified output format, and set the content.value to the envelope.
        let envelope = datahub.flow.flowUtils.makeEnvelope(instance, headers, triples, outputFormat);
        content.value = envelope;

        // Assign the URI. In this example, the URI is the same.
        content.uri = id;

        // Assign the context.
        content.context = context;

        // Return the content to be written.
        return content;
      }

      module.exports = {
        main: main
      };
    

Next Steps

After creating your custom step module, create a custom step using Gradle and configure it manually to use your new module.