Overview of flows in Data Hub.
A flow defines a sequence of one or more steps, which are modules that process and enhance your data. A flow declares which steps will be executed in what order and with which options.
For example, a flow can be comprised of an ingestion step to pull in your raw data, followed by a mapping step to define a one-to-one correspondence between fields in your raw data with the fields in your entity model.
Each flow must contain at least one step. The number of steps in a flow is unlimited; however, flows with fewer steps can be easier to maintain and debug.
Data Hub flows are not designed to replace your external orchestration tool (e.g., Apache NiFi); however, chaining multiple steps in your flows might reduce the complexity of the scenarios that your orchestration tool must handle.
- In Data Hub Framework 4.x, a flow (input or harmonization) is comprised of plugins that process the data.
- In Data Hub 5.0, a flow is a sequence of steps (ingestion, mapping, mastering, or custom) that process the data.
|v4.x and earlier||v5.0|
|Input flow||Ingestion step|
|Harmonization flow||Mapping step|