Data Hub Glossary
Custom Hook Module
Custom code that can perform tasks outside the scope of Data Hub, either immediately before or immediately after a step's main processes. See also: About Custom Modules
Custom Step Module
Custom code that can override default processes or perform additional tasks as the main component of a custom step in a flow. See also: About Custom Modules
Entity
An XML or JSON representation of a high-level business object in your enterprise. Examples of business objects are employee, product, purchase order, and department. See also: Entities
Entity Services
An out-of-the-box API and a set of conventions you can use within MarkLogic to quickly set up an application based on entity modeling.
Envelope
A set of metadata wrapped around the original data, including harmonized parts of the entity. See also: Envelope Pattern and Envelope Design Pattern (developer.marklogic.com)
Flow
A chain of steps that process the data. You can create multiple flows that are configured to process your data differently. See also: About Flows
Flow tracing
The process that logs information about the flows as they run. Inputs to and outputs from every plugin of every flow are recorded into the JOBS database. See also: About Flow Tracing
Harmonization
Ingestion
The Data Hub step that takes in your raw data, wraps each item in an envelope, and stores the wrapped items in the STAGING database for further processing. See also: step
Mapping
The Data Hub step that associates the fields in the entity model with the corresponding fields in your source data. See also: step
Mapping Expression
A valid XPath expression that can include functions and can use values from one or more source fields to assign a calculated value to an entity property during the mapping process.
Matching
The first process in a Data Hub mastering step which checks for possible duplicates in your data based on your criteria, which are defined as rules and thresholds. See also: merging
Merging
The second process in a Data Hub mastering step which determines how matching records would be combined. See also: matching
Mastering
The Data Hub step that uses the MarkLogic Smart Mastering technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. The classic mastering step is comprised of matching (to determine if two or more records refer to the same entity) and merging (to determine how to merge the records that refer to the same entity). If you have a very large dataset with a possibility of very many merges, you can improve the performance by mastering in two steps: a matching step and a merging step. See also: step
Provenance and Lineage
The automated process that ensures that the data can be traced back to its origin and that the source data is preserved. See also: Flow tracing
Smart Mastering
The MarkLogic technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. Used in the Data Hub matching, merging, and mastering steps. See also: Smart Mastering Overview, Smart Mastering Framework, and step
Step
Code that processes or enhances the data. A step can be an ingestion step, a mapping step, a matching step, a merging step, a mastering step, or a custom step. See also: About Steps