Data Hub Glossary

Custom Hook Module

Custom code that can perform tasks outside the scope of Data Hub, either immediately before or immediately after a step's main processes. See also: About Custom Modules

Custom Step Module

Custom code that can override default processes or perform additional tasks as the main component of a custom step in a flow. See also: About Custom Modules

Entity

An XML or JSON representation of a high-level business object in your enterprise. Examples of business objects are employee, product, purchase order, and department. See also: Entities

Entity Services

An out-of-the-box API and a set of conventions you can use within MarkLogic to quickly set up an application based on entity modeling.

Flow

A chain of steps that process the data. You can create multiple flows that are configured to process your data differently. See also: About Flows

Flow tracing

The process that logs information about the flows as they run. Inputs to and outputs from every plugin of every flow are recorded into the JOBS database. See also: About Flow Tracing

Harmonization

The conceptual set of processes (typically mapping and mastering) that unify and merge similar data that are represented differently in various raw data sources. See also: Mapping and Mastering

Ingestion

The Data Hub step that takes in your raw data, wraps each item in an envelope, and stores the wrapped items in the STAGING database for further processing. See also: step

Mapping

The Data Hub step that associates the fields in the entity model with the corresponding fields in your source data. See also: step

Mapping Expression

A valid XPath expression that can include functions and can use values from one or more source fields to assign a calculated value to an entity property during the mapping process.

Matching

The first process in a Data Hub mastering step which checks for possible duplicates in your data based on your criteria, which are defined as rules and thresholds. See also: merging

Merging

The second process in a Data Hub mastering step which determines how matching records would be combined. See also: matching

Mastering

The Data Hub step that uses the MarkLogic Smart Mastering technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. The classic mastering step is comprised of matching (to determine if two or more records refer to the same entity) and merging (to determine how to merge the records that refer to the same entity). If you have a very large dataset with a possibility of very many merges, you can improve the performance by mastering in two steps: a matching step and a merging step. See also: step

Provenance and Lineage

The automated process that ensures that the data can be traced back to its origin and that the source data is preserved. See also: Flow tracing

Smart Mastering

The MarkLogic technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. Used in the Data Hub matching, merging, and mastering steps. See also: Smart Mastering Overview, Smart Mastering Framework, and step

Step

Code that processes or enhances the data. A step can be an ingestion step, a mapping step, a matching step, a merging step, a mastering step, or a custom step. See also: About Steps