Data Hub Glossary
custom hook
A custom module that performs additional processes in its own transaction before or after the core step transaction. Results are saved within each transaction.
Learn more:
About Interceptors and Custom Hooks
custom hook module
Custom code that can perform tasks outside the scope of Data Hub, either immediately before or immediately after a step's main processes.
Learn more:
About Custom Modules
custom step module
Custom code that can override default processes or perform additional tasks as the main component of a custom step in a flow.
Learn more:
About Custom Modules
entity
An abstraction of a logical business object that can be stored and manipulated by applications. For example, a sales model might include entities such as a customer, order, or inventory item.
Learn more: Entities
entity instance
A concrete instantiation of an entity type, as represented by a populated data structure representing an individual entity, or a document containing such a data structure.
entity feature
A data characteristic of an entity type that you can add to implement special functions. An entity feature can be defined in an entity type and a step.
entity property
A concrete characteristic of an entity type. For example, a customer entity type might have properties such as a name, address, and customer ID. Entity properties can be of a basic type (string, integer, etc.), a structured type, or a relationship type.
entity relationship
A logical relationship between entity types. For example, an order entity type might include relationships with a customer and inventory item entities. In Entity Services, an entity relationship is expressed as an entity property whose type is an entity type (rather than scalar or array type).
Learn more: Relationship Type and Defining Entity Relationships.
Entity Services
An out-of-the-box API and a set of conventions you can use within MarkLogic to quickly set up an application based on entity modeling.
entity type
A custom data type that defines the characteristics of an entity instance, including its properties and relationships to other entities.
Learn more: Entities
envelope
A set of metadata wrapped around the original data, including harmonized parts of the entity.
Learn more:
Envelope Pattern and
Envelope Design Pattern (developer.marklogic.com)
flow
A chain of steps that process the data. You can create multiple flows that are configured to process your data differently.
Learn more:
About Flows
flow configuration
A JSON structure that contains the settings for a specific flow. The structure includes references to other files that contain the step configurations.
Learn more:
About Flow and Step Configuration Structures
flow tracing
The process that logs information about the flows as they run. Inputs to and outputs from every plugin of every flow are recorded into the JOBS database.
harmonization
ingestion
interceptor
A custom module that performs additional processes after the core step processes and before the results are saved.
Learn more:
About Interceptors and Custom Hooks
loading
mapping
The Data Hub step that associates the fields in the entity model with the corresponding fields in your source data.
Related term: step
mapping expression
A valid XPath expression that can include functions and can use values from one or more source fields to assign a calculated value to an entity property during the mapping process.
mastering
The Data Hub step that uses the MarkLogic Smart Mastering technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. The classic mastering step is comprised of matching (to determine if two or more records refer to the same entity) and merging (to determine how to merge the records that refer to the same entity). If you have a very large dataset with a possibility of very many merges, you can improve the performance by mastering in two steps: a matching step and a merging step.
Related term: step
matching
The first process in a Data Hub mastering step which checks for possible duplicates in your data based on your criteria, which are defined as rules and thresholds.
Related term: merging
match ruleset
The weighted criteria for what is considered a match.
Related terms: matching, match threshold
match threshold
The degree of matching that would trigger a specific action.
Related terms: matching, match ruleset
merging
The second process in a Data Hub mastering step which determines how matching records would be combined.
Related term: matching
merge rule
A set of conditions and actions that define how to combine the values for the same property in matching documents.
Related terms: merging, merge strategy
merge strategy
A predefined standard merge rule that can be reused.
Related terms: merging, merge rule
modeling
The Data Hub step where you define the entity types that your data would be mapped to for standardization.
Related terms: entity-type, mapping
provenance and lineage
The automated process that ensures that the data can be traced back to its origin and that the source data is preserved.
Related term: Flow tracing
relationship type
A link to another entity type that can be used as the data type of an entity property.
Smart Mastering
The MarkLogic technology that checks for possible duplicates in your data and manages them accordingly based on specified criteria. Used in the Data Hub matching, merging, and mastering steps.
Learn more: Smart Mastering Overview and Smart Mastering Framework
Related term: step
step
Code that processes or enhances the data. A step can be an ingestion step, a mapping step, a matching step, a merging step, a mastering step, or a custom step.
Learn more:
About Steps
step configuration
A JSON structure that contains the settings for a specific step instance. The structure is in its own file, which the flow configuration refers to.
Learn more:
About Flow and Step Configuration Structures
step definition
A JSON structure that serves as a template for a step configuration of a specific type (ingestion/loading, mapping, matching, merging, mastering, or custom). Step definitions are stored in files separate from the flow configuration.
Learn more:
About Steps
structured type
A custom structure that can be used as the data type of an entity property. A structured type is essentially a nested entity type within another entity type and can be reused in other entity types.
user artifact
Project files or records containing the configuration settings for Data Hub, including entity models, flows, step definitions, and steps.
Related term: user data
user data
Documents that have been ingested or processed by flows and steps, as well as match summaries produced by the matching step. User data does not include user artifacts.
Related term: user artifact