About Flow and Step Configuration Structures

Comparison of flow and step definition files in the QuickStart format and in the Hub Central format.

Overview

DHS project artifacts can be in the QuickStart format or in the Hub Central format. Each format is essentially a different JSON schema.

QuickStart and Hub Central can only read the project artifacts in the format intended for that tool; however, most Gradle tasks can read both formats.

A Comparison of Two Formats

Hub Central Format QuickStart Format
Step configurations are in their own separate files in the your-project-root/steps/step-type directory, and the flow configuration structure includes references to them. Step configurations are embedded in the flow configuration structure.
The mapping configuration is embedded in the mapping step configuration file under $.properties. The mapping configuration is in a separate file in the your-project-root/mappings/flow-name-step-name directories.
The hierarchy in the step definition is flatter without the $.options and $.fileLocations objects. Some properties are embedded inside the $.options and $.fileLocations objects.
Different property names:
  • sourceFormat
  • targetFormat
Different property names:
  • inputFileFormat
  • outputFormat
Additional properties:
  • stepId
  • selectedSource
  • lastUpdated
In the Matching step configuration file, the match rulesets are stored in matchRulesets » matchRules » "matchType": "MATCH-TYPE".
  • Exact. The matchType key has the value exact. The match type does not use any options.
    Example
     
    "matchRulesets": [
      {
        "name": "name - Exact",
        "weight": 3.5, 
        "matchRules": [
          {
            "entityPropertyPath": "name",
            "matchType": "exact",
            "options": {}
          } 
        ]
      }
    ],
    
  • Synonym. The matchType key has the value synonym. The match type uses the thesaurusURI and filter options.
    Example
     
    "matchRulesets": [
      {
        "name": "name - Synonym",
        "weight": 3.5, 
        "matchRules": [
          {
            "entityPropertyPath": "name",
            "matchType": "synonym",
            "options": {
              "thesaurusURI": "/thesauri/name-synonyms.xml",
              "filter": "<qualifier>english</qualifier>"
            }
          } 
        ]
      }
    ],
    
  • Double Metaphone. The matchType key has the value doubleMetaphone. The match type uses the dictionaryURI and distanceThreshold options.
    Example
     
    "matchRulesets": [
      {
        "name": "name - Double Metaphone",
        "weight": 3.5, 
        "matchRules": [
          {
            "entityPropertyPath": "name",
            "matchType": "doubleMetaphone",
            "options": {
              "dictionaryURI": "/nameDictionary.json",
              "distanceThreshold": 100
            }
          } 
        ]
      }
    ],
    
  • Zip. The matchType key has the value zip. The match type does not use any options.
    Example
     
    "matchRulesets": [
      {
        "name": "name - Zip",
        "weight": 1.5, 
        "matchRules": [
          {
            "entityPropertyPath": "name",
            "matchType": "zip",
            "options": {}
          } 
        ]
      }
    ],
    
  • Reduce. The matchType key has the value exact. The weight is negative when the reduce key has the value true. The match type does not use any options.
    Example
     
    "matchRulesets": [
      {
        "name": "name - Reduce",
        "weight": 1.5, 
        "reduce": true,  
        "matchRules": [
          {
            "entityPropertyPath": "address",
            "matchType": "exact",
            "options": {}
          },
          {
            "entityPropertyPath": "lastName",
            "matchType": "exact",
            "options": {}
          } 
        ]
      }
    ],
    
  • Custom. The matchType key has the value custom. The custom module is defined using algorithmModuleNamespace, algorithmModulePath, and algorithmFunction. The match type does not use any options.
    Example
     
    "matchRulesets": [
      {
        "name": "name - Custom",
        "weight": 1.5, 
        "matchRules": [
          {
            "entityPropertyPath": "name",
            "matchType": "custom",
            "algorithmModuleNamespace": "",
            "algorithmModulePath": "/custom-modules/matching/nameMatch.sjs",
            "algorithmFunction": "nameMatch",
            "options": {}
          } 
        ]
      }
    ],
    
In the Matching step configuration file, the match rulesets are stored in scoring.
  • Exact. Stored in scoring » add. The match type does not use any options.
    Example
     
    "scoring": {
      "add": [
        {
          "propertyName": "lastName",
          "weight": "3.5"
        }
      ]
    },
    
  • Synonym. Stored in scoring » expand. The algorithmRef key has the value thesaurus. The match type uses the thesaurus and filter options.
    Example
     
    "scoring": {
      "expand": [
        {
          "propertyName": "name",
          "algorithmRef": "thesaurus",
          "weight": "2.5",
          "thesaurus": "/thesauri/name-synonyms.xml", 
          "filter": "<qualifier>english</qualifier>"
        }
      ]
    },
    
  • Double Metaphone. Stored in scoring » expand. The algorithmRef key has the value double-metaphone. The match type uses the dictionary and distanceThreshold options.
    Example
     
    "scoring": {
      "expand": [
        {
          "propertyName": "name",
          "algorithmRef": "double-metaphone",
          "weight": "2.5",
          "dictionary": "/nameDictionary.json",
          "distanceThreshold": "100"
        }
      ]
    },
    
  • Zip. Stored in scoring » expand. The algorithmRef key has the value zip-match. The match type uses the zip array that contains origin and weight properties.
    Example
     
    "scoring": {
      "expand": [
        {
          "propertyName": "name",
          "algorithmRef": "zip-match",
          "zip": [
            {"origin": "5", "weight": "1.5"},
            {"origin": "9", "weight": "1"}
          ]
        }
      ]
    },
    
  • Reduce. Stored in scoring » reduce. The algorithmRef key has the value standard-reduction. The property array contains the entity properties that must exactly match to reduce the score. The match type does not use any options.
    Example
     
    "scoring": {
      "reduce": [
        {
          "allMatch": {
            "property" : [ "address", "lastName" ]
          }
        }
        {
          "algorithmRef": "standard-reduction",
          "weight": "3.5"
        }
      ]
    },
    
  • Custom. Stored in scoring » expand. The algorithmRef key has the value custom-name-match. The custom functions are defined in algorithms » algorithm.
    Example
     
    "algorithms": {
      "algorithm": [
        {
          "name": "custom-name-match",
          "function": "nameMatch",
          "at": "/custom-modules/matching/nameMatch.sjs",
          "namespace": ""
        }
      ]
    },
    "scoring": {
      "expand": [
        {
          "propertyName": "name",
          "algorithmRef": "custom-name-match",
          "weight": "2.5"
        }
      ]
    },
    
In the Matching step configuration file, custom actions (including all other actions) are stored in thresholds. Custom actions are defined using actionModuleNamespace, actionModulePath, and actionModuleFunction.
Example
 
"thresholds": [
  {
    "thresholdName": "similarThreshold",
    "action": "notify",
    "score": 6.5
  },
  {
    "thresholdName": "household",
    "action": "custom",
    "score": 8.5,
    "actionModulePath": "/custom-modules/matching/householdAction.xqy",
    "actionModuleNamespace": "http://marklogic.com/smart-mastering/action",
    "actionModuleFunction": "household-action"
  }
],
In the Matching step configuration file, custom actions are stored in actions » action. The action type is stored in thresholds » threshold » action.
Example
 
"actions": {
  "action": [
    {
      "name": "household-action",
      "function": "household-action",
      "namespace": "http://marklogic.com/smart-mastering/action",
      "at": "/custom-modules/matching/custom-action.xqy"
    }
  ]
},
"thresholds": {
  "threshold": [
    {
      "above": "6.5",
      "label": "similarThreshold",
      "action": "notify"
    },
    {
      "above": "8.5",
      "label": "household",
      "action": "household-action"
    }
  ]
},
Different custom function configuration for accessing values from the Matching step configuration file:
XQuery Example
 
declare function algorithm:match-via-tde-row(
  $values as item()*,
  $match-rule as object-node(),
  $match-step as object-node()
) as cts:query*
{
  let $property-name := $match-rule/entityPropertyPath
  let $entity-type := $match-step/targetEntityType
  let $property-column := sem:iri("http://marklogic.com/column/" 
      || fn:replace($entity-type, "^.*/([^/]+/[^/]+)$", "$1") || "/" || $property-name)
  return
    cts:triple-query((), $property-column, $values)
};
JavaScript Example
 
function matchViaTdeRow(values, matchRule, matchConfiguration)
{
  let propertyName = matchRule.propertyName;
  let entityType = matchStep.targetEntityType;
  let propertyColumn = sem.iri("http://marklogic.com/column/" 
      + fn.replace(entityType, "^.*/([^/]+/[^/]+)$", "$1") + "/" + propertyName)
  return cts.tripleQuery(null, propertyColumn, values);
};
Different custom function configuration for accessing values from the Matching step configuration file:
XQuery Example
 
declare function algorithm:match-via-tde-row(
  $values as item()*,
  $expand-rule as element(match:expand),
  $match-configuration as element(match:options)
) as cts:query*
{
  let $property-name := $expand-xml/@property-name
  let $entity-type := $match-configuration/match:target-entity
  let $property-column := sem:iri("http://marklogic.com/column/" 
      || fn:replace($entity-type, "^.*/([^/]+/[^/]+)$", "$1") || "/" || $property-name)
  return
    cts:triple-query((), $property-column, $values)
};
JavaScript Example
 
function matchViaTdeRow(values, expandRule, matchConfiguration)
{
  let propertyName = expandRule.propertyName;
  let entityType = matchConfiguration.targetEntity;
  let propertyColumn = sem.iri("http://marklogic.com/column/" 
      + fn.replace(entityType, "^.*/([^/]+/[^/]+)$", "$1") + "/" + propertyName)
  return cts.tripleQuery(null, propertyColumn, values);
};
In the Merging step configuration file, the last updated date and time is stored in lastUpdatedLocation » documentXPath.
Example
 
{
  "mergeStrategies": [],
  "mergeRules": [],
  "lastUpdatedLocation": {
    "namespaces": {
      "es": "http://marklogic.com/entity-services",
      "sm": "http://marklogic.com/smart-mastering"
    },
    "documentXPath": "/es:envelope/es:headers/sm:sources/sm:source/sm:dateTime"
  }
}
In the Merging step configuration file, the last updated date and time is stored in algorithms » stdAlgorithm » timestamp » path.
Example
 
 {
   "algorithms": {
     "stdAlgorithm": {
       "namespaces": {
         "sm": "http://marklogic.com/smart-mastering",
         "es": "http://marklogic.com/entity-services"
       },
       "timestamp": {
         "path": "/es:envelope/es:headers/sm:sources/sm:source/sm:dateTime"
       }
     }
   }
 }
In the Merging step configuration file, merge rulesets are stored in mergeRules.
Example
 
{
  "mergeRules": [
    {
      "entityPropertyPath": "name",
      "maxSources": 1,
      "priorityOrder": {
        "lengthWeight": 2,
        "sources": [
          {
            "sourceName": "favoriteSource",
            "weight": 12
          },
          {
            "sourceName": "lessFavoriteSource",
            "weight": 10
          }
        ]
      }
    }
  ]
}
In the Merging step configuration file, merge rulesets are stored in merging.
Example
 
 {
   "merging": [
     {
       "propertyName": "name",
       "algorithmRef": "standard",
       "length": {
         "weight": "2"
       },
       "name": "myFavoriteSource",
       "maxSources": 1,
       "sourceWeights": [
         {
           "source": {
             "name": "favoriteSource",
             "weight": "12"
           }
         },
         {
           "source": {
             "name": "lessFavoriteSource",
             "weight": "10"
           }
         }
       ]
     }
   ],
   "propertyDefs": {
     "properties": [
       {
         "localname": "name",
         "name": "name"
       }
     ]
   }
 }
In the Merging step configuration file, custom merge functions are stored in mergeRules. The custom merge function is defined using entityPropertyPath, mergeModulePath, and mergeModuleFunction.
Example
 
{
  "mergeRules": [
    {
      "entityPropertyPath": "addressLocalName",
      "mergeModulePath": "/custom/merge/strategy.sjs",
      "mergeModuleFunction": "mergeAddress",
      "options": {}
    }
  ]
}
In the Merging step configuration file, custom merge functions are stored in algorithms » custom. The custom merge function is defined using name, function, and at.
Example
 
 {
   "propertyDefs": {
     "properties": [
       {
         "localname": "addressLocalName",
         "name": "addressName"
       }
     ]
   },
   "algorithms": {
     "custom": [
       {
         "name": "addressAlgorithm",
         "function": "mergeAddress",
         "at": "/custom/merge/strategy.sjs"
       }
     ]
   },
   "merging": [
     {
       "propertyName": "addressName",
       "algorithmRef": "addressAlgorithm"
     }
   ]
 }
Different custom function configuration for accessing values from the Merging step configuration file:
XQuery Example
 
declare function algorithm:custom-merge-limit(
  $property-name as xs:QName,
  $properties as map:map*,
  $merge-rule as object-node()
) as map:map*
{
  let $default-limit := if ($merge-rule/entityPropertyPath = "Phone") then 5 else 10
  return fn:subsequence(
        $properties,
        fn:head(($merge-rule/maxValues, $default-limit))
  )
};
JavaScript Example
 
function customMergeLimit(propertyName, properties, mergeRule)
{
  let defaultLimit = mergeRule.entityPropertyPath === "Phone" ? 5 : 10;
  return fn.subsequence(
        properties,
        mergeRule.maxValues || defaultLimit
  );
}
Different custom function configuration for accessing values from the Merging step configuration file:
XQuery Example
 
declare function algorithm:custom-merge-limit(
  $property-name as xs:QName,
  $properties as map:map*,
  $merge-rule as element(merging:merge)
) as map:map*
{
  let $default-limit := if ($merge-rule/@property-name = "Phone") then 5 else 10
  return fn:subsequence(
        $properties,
        fn:head(($merge-rule/@max-values, 5))
  )
};
JavaScript Example
 
function customMergeLimit(propertyName, properties, mergeRule)
{
  let defaultLimit = mergeRule.propertyName === "Phone" ? 5 : 10;
  return fn.subsequence(
        properties,
        mergeRule.maxValues || defaultLimit
  );
}
To create a flow, run the Gradle task hubCreateFlow without -PwithInlineSteps. It creates multiple files: one for the flow configuration and one for each of the example step configurations.

Learn more: Create Flow Using Gradle

To create a flow, run the Gradle task hubCreateFlow with -PwithInlineSteps=true. It creates a single file containing the flow configuration with embedded example steps.

Learn more: Create Flow Using Gradle

To create a step and add it to a flow, run the Gradle task hubCreateStep and hubAddStepToFlow.

Learn more: Create Steps Using Gradle - HC Format

To create a step and add it to a flow, run the Gradle task hubCreateStepDefinition and manually copy the step configuration structure from the new step definition file to the appropriate location in the flow configuration structure.
Details Details

In either format, you must customize the example steps before running the flow. You can delete the steps you don't need, and you can duplicate the steps if you need multiple steps of the same type. However, you must assign a unique sequence number for each step.

Tip: You can convert your flows and steps into the Hub Central format even if you intend to use only Gradle.