Curating with Hub Central
Overview
A typical Data Hub data flow involves the following operations:
- Load/Ingest your raw data into MarkLogic Server.
- Create an entity model to standardize your data fields.
- Map the fields in your raw data to the fields of the entity model.
- (Optional) Match and merge duplicates.
Security
You must be assigned the following security roles:
- To view, create, edit, or delete a step: Hub Central Developer or Hub Central Curator
- To add a step to a flow: Hub Central Developer or Hub Central Curator
- To run a step: Hub Central Operator or Hub Central Curator
Or any role that inherits the required role. See Users and Roles.
Curating Process
Data curation includes two main processes:
- Mapping your raw data models onto your entity model.
- Mastering your mapped data to identify and handle duplicates.
You can also create Custom steps to replace or supplement any of the default curation steps.
Each step is comprised of three sets of settings:
- Basic settings.
- When creating a step, you set these values in the first dialog.
- To update these values in an existing step, click the pencil icon () for the step.
- Advanced settings. To update these values in an existing step, click Step Settings () for the step.
- Specialized settings (mapping configuration, matching configuration, merging configuration). To update these values in an existing step, click the sliders icon () for the step.
Managing Curation Steps in Hub Central
Step | How To | |
---|---|---|
Mapping |
|
|
Matching | ||
Merging | ||
Custom | Custom Processing with Hub Central |