Overview of flows in Data Hub.

About Flows

A flow defines a sequence of one or more steps, which are modules that process and enhance your data. A flow declares which steps will be executed in what order and with which options.

For example, a flow can be comprised of an ingestion step to pull in your raw data, followed by a mapping step to define a one-to-one correspondence between fields in your raw data with the fields in your entity model.

Each flow must contain at least one step. The number of steps in a flow is unlimited; however, flows with fewer steps can be easier to maintain and debug.

Data Hub flows are not designed to replace your external orchestration tool (e.g., Apache NiFi); however, chaining multiple steps in your flows might reduce the complexity of the scenarios that your orchestration tool must handle.

Important: If you are upgrading from 4.x or earlier versions, be aware that the definition of flow changed in Data Hub 5.0.
  • In Data Hub Framework 4.x, a flow (input or harmonization) is comprised of plugins that process the data.
  • In Data Hub 5.0, a flow is a sequence of steps (ingestion, mapping, mastering, or custom) that process the data.
v4.x and earlierv5.0
Input flowIngestion step
Harmonization flowMapping step
Mastering step
Custom step