MarkLogic Data Hub

Overview

MarkLogic Data Hub is a set of tools and libraries that help you quickly build an operational data hub on MarkLogic Server.

A data hub is a repository that consolidates data from various silos. The operational data hub pattern is a way of building data hubs that facilitates faster and more agile data integration, while allowing real-time concurrent interactive access to data. For example, shared services and analytics reports can access the data hub at the same time, even while additional data is integrated into the data hub.

In addition, an operational data hub performs the following:

Secures all the data and operations at the entity or attribute level.
Traces data lineage. Where did it come from? Who loaded it? When?

MarkLogic Data Hub generates code scaffolding to jumpstart your development and provides DevOps processes for setting up a fully automated build.

Concepts in MarkLogic Data Hub

You can ingest your data from various sources into the Data Hub and standardize your data to make it easier for apps to consume your data. You can do so in the Data Hub using steps in a flow.

A flow can have any combination of the following types of steps:

Ingestion to pull in your data into the Data Hub.
Mapping to standardize your data using an entity model.
Matching to search for duplicates. (The first step of a split-step mastering process.)
Merging to merge duplicates. (The second step of a split-step mastering process.)
Mastering to match and merge duplicates. (The classic combined-step mastering process.)
Custom to allow you to perform additional processing on your data or to replace the default behavior of other step types.

To learn more, see Entities, Flows, and Steps.

Get Started!

Install or upgrade from Data Hub Framework 4.x.
Choose tools to perform tasks, such as creating flows and steps.