Data Hub Glossary

custom hook

A custom module that performs additional processes in its own transaction before or after the core step transaction. Results are saved within each transaction. Learn more: About Interceptors and Custom Hooks

custom hook module

Custom code that can perform tasks outside the scope of Data Hub, either immediately before or immediately after a step's main processes. Learn more: About Custom Modules

custom step module

Custom code that can override default processes or perform additional tasks as the main component of a custom step in a flow. Learn more: About Custom Modules

entity

An abstraction of a logical business object that can be stored and manipulated by applications. For example, a sales model might include entities such as a customer, order, or inventory item. Learn more: Entities

entity instance

A concrete instantiation of an entity type, as represented by a populated data structure representing an individual entity, or a document containing such a data structure.

entity property

A concrete characteristic of an entity type. For example, a customer entity type might have properties such as a name, address, and customer ID. Entity properties can be of a basic type (string, integer, etc.), a structured type, or a relationship type.

entity relationship

A logical relationship between entity types. For example, an order entity type might include relationships with a customer and inventory item entities. In Entity Services, an entity relationship is expressed as an entity property whose type is an entity type (rather than scalar or array type). Learn more: Relationship Type and Defining Entity Relationships.

Entity Services

An out-of-the-box API and a set of conventions you can use within MarkLogic to quickly set up an application based on entity modeling.

entity type

A custom data type that defines the characteristics of an entity instance, including its properties and relationships to other entities. Learn more: Entities

flow

A chain of steps that process the data. You can create multiple flows that are configured to process your data differently. Learn more: About Flows

flow configuration

A JSON structure that contains the settings for a specific flow. In the QuickStart format, this structure includes nodes that contain the step configurations. In the Hub Central format, this structure includes references to other files that contain the step configurations. Learn more: About Flow and Step Configuration Structures

flow tracing

The process that logs information about the flows as they run. Inputs to and outputs from every plugin of every flow are recorded into the JOBS database.

harmonization

The conceptual set of processes (typically mapping and mastering) that unify and merge similar data that are represented differently in various raw data sources. Related terms: Mapping and Mastering

ingestion

The Data Hub step that takes in your raw data, wraps each item in an envelope, and stores the wrapped items in the STAGING database for further processing. This step is the same as the Loading step in Hub Central. Related term: step

interceptor

A custom module that performs additional processes after the core step processes and before the results are saved. Learn more: About Interceptors and Custom Hooks

loading

The Hub Central step that takes in your raw data, wraps each item in an envelope, and stores the wrapped items in a database for further processing. This step is the same as the Ingestion step in other tools. Related term: step

mapping

The Data Hub step that associates the fields in the entity model with the corresponding fields in your source data. Related term: step

mapping expression

A valid XPath expression that can include functions and can use values from one or more source fields to assign a calculated value to an entity property during the mapping process.

mastering

The Data Hub step that uses the MarkLogic Smart Mastering technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. The classic mastering step is comprised of matching (to determine if two or more records refer to the same entity) and merging (to determine how to merge the records that refer to the same entity). If you have a very large dataset with a possibility of very many merges, you can improve the performance by mastering in two steps: a matching step and a merging step. Related term: step

matching

The first process in a Data Hub mastering step which checks for possible duplicates in your data based on your criteria, which are defined as rules and thresholds. Related term: merging

match threshold

The degree of matching that would trigger a specific action. Related terms: matching, match ruleset

merging

The second process in a Data Hub mastering step which determines how matching records would be combined. Related term: matching

merge rule

A set of conditions and actions that define how to combine the values for the same property in matching documents. Related terms: merging, merge strategy

merge strategy

A predefined standard merge rule that can be reused. Related terms: merging, merge rule

modeling

The Data Hub step where you define the entity types that your data would be mapped to for standardization. Related terms: entity-type, mapping

provenance and lineage

The automated process that ensures that the data can be traced back to its origin and that the source data is preserved. Related term: Flow tracing

relationship type

A link to another entity type that can be used as the data type of an entity property.

Smart Mastering

The MarkLogic technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. Used in the Data Hub matching, merging, and mastering steps. Learn more: Smart Mastering Overview and Smart Mastering Framework Related term: step

step

Code that processes or enhances the data. A step can be an ingestion step, a mapping step, a matching step, a merging step, a mastering step, or a custom step. Learn more: About Steps

step configuration

A JSON structure that contains the settings for a specific step instance. In the QuickStart format, this structure is embedded within the flow configuration. In the Hub Central format, this structure is in its own file, which the flow configuration refers to. Learn more: About Flow and Step Configuration Structures

step definition

A JSON structure that serves as a template for a step configuration of a specific type (ingestion/loading, mapping, matching, merging, mastering, or custom). Step definitions are stored in files separate from the flow configuration. Learn more: About Steps

structured type

A custom structure that can be used as the data type of an entity property. A structured type is essentially a nested entity type within another entity type and can be reused in other entity types.

user artifact

Project files or records containing the configuration settings for Data Hub, including entity models, flows, step definitions, and steps. Related term: user data

user data

Documents that have been ingested or processed by flows and steps, as well as match summaries produced by the matching step. User data does not include user artifacts. Related term: user artifact