Migrating 4.3 Flows to 5.x and 6.x Steps
Data Hub 5.x Steps
- A step is equivalent to a Data Hub Framework 4.3 flow. See About Steps.
- A flow is a sequence of steps. See About Flows.
Migrating your 4.3 flows into Data Hub 5.x steps is optional. You can continue to run your DHF 4.3 flows inside Data Hub 5.x using legacy Gradle tasks.
However, to take advantage of Data Hub 5.x features, such as mastering and improved mapping, you must migrate your DHF 4.3 flows into Data Hub 5.x steps in the following ways or a combination of both, and then add your Data Hub 5.x steps to one or more flows.
Additionally, if you intend to use Hub Central, you must also convert your models, steps, and flows to the Hub Central format. Learn more: convert your project artifacts from QuickStart to Hub Central
Use Configuration-Based Steps
You can create Data Hub 5.x ingestion steps and mapping steps to replace your DHF 4.3 input flows and harmonize flows, respectively.
4x Flows | 5x Steps |
---|---|
input flow | ingestion step |
harmonize flow | mapping step |
You can customize these steps by adding interceptor or custom hook modules that run before or after the default step processes.
- You can take greater advantage of Data Hub 5.x features.
- New steps are more compatible with current and future Data Hub 5.x features.
- Less code to maintain, although you might have to initially trim down your existing plugins to avoid duplicating the default step processes. If your plugins perform more processes than the default, you can add the associated code for those processes to an interceptor or a custom hook. Learn more: About Interceptors and Custom Hooks
Learn how to create ingestion and mapping steps, as well as entities and flows: Getting Started
Use Custom Steps
You can create Data Hub 5.x custom steps and add your DHF 4.3 plugin code (as is or rewritten) to each custom step's module.
4x Flows | 5x Steps |
---|---|
input flow | custom-ingestion step |
harmonize flow | custom-mapping step |
- Ideal if your plugins perform extensive custom processes that the Data Hub 5.x configuration-based steps do not handle.
- Initially faster to implement because you don't need to change your headers, triples, or content code. However, you would need to maintain your modules to keep them compatible with future Data Hub releases.
- If you copy your plugin code as is into the custom step module, you might be importing unnecessary libraries for functionality that the Data Hub 5.x configuration-based steps already handle. These libraries could adversely impact performance.
- Requires more technical expertise to maintain.
See Getting Started to learn how to create custom steps, as well as entities and flows.
See Editing a Custom Step Module.
Examples:
- To add your DHF 4.3 plugin code as is, see an example project in which a Data Hub 5.x custom step calls the headers, triples, and content JavaScript libraries copied from a DHF 4.3 project.
- To rewrite your DHF 4.3 plugin code into a native Data Hub 5.x module, see an example that uses the custom step code template as a guide.
Data Hub 6.x Steps
In Data Hub 6.x,
- All former term definitions apply.
You can Upgrade your 4.3.x legacy flows to 6.x steps.
Step Naming Conventions:
- Step names must start with a letter (i.e A-Z) and must only contain letters, numbers, hyphens, and underscores.
- If any legacy flow names contain special characters (i.e #,$,*), the 6.x step removes them from the name.
- If any legacy flow names begin with special characters or numbers, the new step adds "dh_" to the beginning of the legacy flow name in the step.
For Input Flow:
- The hubUpgradeLegacyFlows command creates a custom ingestion step and related artifacts for every 4.3.2 input flow.
- When the command upgrades a 4.3.2 legacy flow to 6.x, it creates a step-definition with the same name as the 4.3.2 ingestion flowName with ("type" : "INGESTION").
- The command places a custom module in the src/main/ml-modules/root/custom-modules/ingestion/{step-def-name} directory. The step-definition artifact contains the path of the module.
- The command creates a step artifact with stepDefinitionType ingestion and stepDefinitionName. The stepName and the stepDefinitionName are the same.
Step Options for Custom Ingestion
Input Flow to Custom Ingestion Step:
- The user should manually update
inputFilePath
. TheinputFilePath
value does not automatically configure. - The
sourceFormat
andtargetFormat
values might not match the user's settings. The user may need to manually configure them.
{
// Name also follows the step naming conventions
"name" : "name-of-the-legacy-flow",
"description" : "",
"stepDefinitionType" : "ingestion",
"stepDefinitionName" : "name-of-the-legacy-flow",
"targetDatabase" : "staging database name as defined in gradle.properties",
"sourceFormat" : "dataFormat as defined in legacyflow.properties file",
"targetFormat" : "dataFormat as defined in legacyflow.properties file",
"options" : {
"flow" : "name-of-the-legacy-flow",
"entity" : "",
"dataFormat" : "dataFormat as defined in legacyflow.properties file",
"mainModuleUri" : "legacy flow main module uri"
},
"collections" : [ "step-name", "entity-name"],
"permissions" : "data-hub-common,read,data-hub-common,update",
"stepId" : "load-acme-tech-ingestion",
"isUpgradedLegacyFlow" : true,
"inputFilePath" : ""
}
For Harmonize Flow:
- The hubUpgradeLegacyFlows command creates a custom step and related artifacts for every 4.3.2 harmonization flow.
- When the command upgrades a 4.3.2 legacy flow to 6.x, it creates a step-definition with the same name as the 4.3.2 harmonization flowName with ("type" : "CUSTOM").
- The command places a custom module in the src/main/ml-modules/root/custom-modules/custom/{step-def-name} directory. The step-definition artifact contains the path of the module.
- The command creates a step artifact with stepDefinitionType custom and stepDefinitionName. The stepName and the stepDefinitionName are the same.
Step Options for Custom Step
{
"name" : "name-of-the-legacy-flow",
"description" : "",
"stepDefinitionType" : "custom",
"stepDefinitionName" : "name-of-the-legacy-flow",
"targetDatabase" : "final database name as defined in gradle.properties",
"options" : {
"flow" : "name-of-the-legacy-flow",
"entity" : "entity-dir-name",
"dataFormat" : "dataFormat as defined in legacyflow.properties file",
"mainModuleUri" : "legacy flow main module uri"
},
"collections" : [ "step-name", "entity-name" ],
"permissions" : "data-hub-common,read,data-hub-common,update",
"stepId" : "harmonize-acme-tech-custom",
"isUpgradedLegacyFlow" : true,
// Additional properties that are added on top of custom-ingestion-step. The above properties are same for custom-ingestion and custom steps
"sourceDatabase" : "staging database name as defined in gradle.properties",
"sourceQueryIsModule" : true,
"selectedSource" : "sourceModule",
"sourceModule" : {
"modulePath" : "legacy flow collector module uri",
"functionName" : "collect"
},
}
Using the sourceModule
Function
When upgrading your legacy flows to 6.x, Data Hub generates the sourceModule
function when it creates the 6.x steps:
"selectedSource" : "sourceModule",
"sourceQueryIsModule" : true,
"sourceModule" : {
"modulePath" : "/uri/of/the/collector/module",
"functionName" : "collectorFunctionNameTobeInvoked"
}