Data Hub Glossary

Custom Hook Module

Custom code that can perform tasks outside the scope of Data Hub, either immediately before or immediately after a step's main processes. See also: About Custom Modules

Custom Step Module

Custom code that can override default processes or perform additional tasks as the main component of a custom step in a flow. See also: About Custom Modules

Entity

An XML or JSON representation of a high-level business object in your enterprise. Examples of business objects are employee, product, purchase order, and department. See also: Entities

Entity Services

An out-of-the-box API and a set of conventions you can use within MarkLogic to quickly set up an application based on entity modeling.

Flow

A chain of steps that process the data. You can create multiple flows that are configured to process your data differently. See also: About Flows

Flow tracing

The process that logs information about the flows as they run. Inputs to and outputs from every plugin of every flow are recorded into the JOBS database.

Harmonization

The conceptual set of processes (typically mapping and Smart Mastering) that unify and merge similar data that are represented differently in various raw data sources. See also: Mapping and Smart Mastering

Ingestion

The Data Hub step that takes in your raw data, wraps each item in an envelope, and stores the wrapped items in the STAGING database for further processing. See also: step

Mapping

The Data Hub step that associates the fields in the entity model with the corresponding fields in your source data. See also: step

Mastering

The Data Hub step that uses the MarkLogic Smart Mastering technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. A mastering step is comprised of matching (to determine if two or more records refer to the same entity) and merging (to determine how to merge the records that refer to the same entity). See also: step

Matching

The first process in a Data Hub mastering step which checks for possible duplicates in your data based on your criteria, which are defined as rules and thresholds. See also: merging

Merging

The second process in a Data Hub mastering step which determines how matching records would be combined. See also: matching

Provenance and Lineage

The automated process that ensures that the data can be traced back to its origin and that the source data is preserved. See also: Flow tracing

Step

Code that processes or enhances the data. A step can be an ingestion step, a mapping step, a Smart Mastering step, or a custom step. See also: About Steps