Overview of flows in Data Hub.

About Flows

A flow defines a sequence of one or more steps, which are modules that process and enhance your data. A flow declares which steps will be executed in what order and with which options.

For example, a flow can be comprised of an ingestion step to pull in your raw data, followed by a mapping step to specify how values are assigned to the properties of your entity model based on the fields of your raw data.

Each flow must contain at least one step. The number of steps in a flow is unlimited; however, flows with fewer steps can be easier to maintain and debug.

Data Hub flows are not designed to replace your external orchestration tool (e.g., Apache NiFi); however, chaining multiple steps in your flows might reduce the complexity of the scenarios that your orchestration tool must handle.

Important: If you are upgrading from 4.x or earlier versions, be aware that the definition of flow changed in Data Hub 5.x.
  • In Data Hub Framework 4.x, a flow (input or harmonization) is comprised of plugins that process the data.
  • In Data Hub 5.x, a flow is a sequence of steps (ingestion, mapping, matching, merging, mastering, or custom) that process the data.
v4.x and earlier v5.0 v5.1
Input flow Ingestion step Ingestion step
Harmonization flow Mapping step Mapping step
Mastering step
  • Matching step
  • Merging step
  • Mastering step (combined matching and merging)
Custom step
  • Custom-Ingestion step
  • Custom-Mapping step
  • Custom-Mastering step
  • Custom-Other step