Curating with Hub Central

Overview

A typical Data Hub data flow involves the following operations:
  1. Load/Ingest your raw data into MarkLogic Server.
  2. Create an entity model to standardize your data fields.
  3. Map the fields in your raw data to the fields of the entity model.
  4. (Optional) Match and merge duplicates.

Security

You must be assigned the following security roles:

  • To view, create, edit, or delete a step: Hub Central Developer or Hub Central Curator
  • To add a step to a flow: Hub Central Developer or Hub Central Curator
  • To run a step: Hub Central Operator or Hub Central Curator

Or any role that inherits the required role. See Users and Roles.

Curating Process

Data curation includes two main processes:

  • Mapping your raw data models onto your entity model.
  • Mastering your mapped data to identify and handle duplicates.

You can also create Custom steps to replace or supplement any of the default curation steps.

Each step is comprised of three sets of settings:

  • Basic settings.
    • When creating a step, you set these values in the first dialog.
    • To update these values in an existing step, click the pencil icon () for the step.
  • Advanced settings. To update these values in an existing step, click Manage Queries () for the step.
  • Specialized settings (mapping configuration, matching configuration, merging configuration). To update these values in an existing step, click the sliders icon () for the step.

Managing Curation Steps in Hub Central

Step How To
Mapping
  • Add the step to a flow and run it.
    Important: The matching step must be run before the merging step.
  • Manage the step in the flow.
  • To delete a step,
    1. Go to the Curate area of Hub Central.
      Learn how.
      1. Go to your Hub Central endpoint.
        Note: Disregard this step if you are working from an on-prem environment. See Step 1b.
      2. In the icon bar, click the Curate icon ().
        Hub Central - icon bar - Curate

    2. Expand the entity type used by the step.
    3. Click the tab for the appropriate step type: Map, Match, or Merge.
    4. Locate the step and click the trash icon () at the bottom of the step's tile.
Matching
Merging
Custom Custom Processing with Hub Central