Flows

Overview of flows in Data Hub.

About Flows

A flow defines a sequence of one or more steps, which are modules that process and enhance your data. A flow declares which steps will be executed in what order and with which options.

For example, a flow can be comprised of an ingestion step to pull in your raw data, followed by a mapping step to specify how values are assigned to the properties of your entity model based on the fields of your raw data.

Each flow must contain at least one step. The number of steps in a flow is unlimited; however, flows with fewer steps can be easier to maintain and debug.

Data Hub flows are not designed to replace your external orchestration tool (e.g., Apache NiFi); however, chaining multiple steps in your flows might reduce the complexity of the scenarios that your orchestration tool must handle.

Important: If you are upgrading from 4.x or earlier versions, be aware that the definition of flow changed in Data Hub 5.x.

In Data Hub Framework 4.x, a flow (input or harmonization) is comprised of plugins that process the data.
In Data Hub 5.x, a flow is a sequence of steps (ingestion, mapping, matching, merging, mastering, or custom) that process the data.

v4.x and earlier	v5.0	v5.1	v5.4 - HC Format
Input flow	Ingestion step	Ingestion step	Loading step
Harmonization flow	Mapping step	Mapping step	Mapping step
	Mastering step	Matching step Merging step Mastering step (combined matching and merging)
	Custom step	Custom-Ingestion step Custom-Mapping step Custom-Mastering step Custom-Other step	Custom-Loading step Custom step