Provenance and Lineage

Overview

In MarkLogic, provenance tracks the origin of the data and lineage is the history of the data. Provenance metadata is the combined set of provenance information and lineage information tracked by MarkLogic Data Hub. Provenance information is updated with every change made to the record from ingestion through its lifetime in the MarkLogic Server.

All provenance and lineage information is stored as XML documents (using the PROV XML schema) in the data-hub-JOBS database and are added to the protected collection http://marklogic.com/provenance-services/record. When provenance and lineage records are created, triples that define the relationships among the pieces of information are also generated.

You can view provenance information using the Query Console.

Security

You must be assigned the following security roles:

  • ps-user
    • Permits an assigned user to read provenance and lineage information in the JOBS database.
    • Inherited by the data-hub-developer and the flow-developer-role roles.
      Tip: You can assign the ps-user role to non-developer user accounts that need to read provenance records.

    This role is provided by MarkLogic Server.

Provenance Granularity

Data Hub provides three levels of granularity for provenance information: coarse (default), fine, and off.

"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"

Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.

Property-level provenance information is also tracked.

The set of provenance information is not customizable.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"

Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.

Property-level provenance information is also tracked.

In a mapping step, provenance information includes every entity property and the XPath of the source field mapped to it.

The set of provenance information is not customizable.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"

Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.

Property-level provenance information is also tracked.

In a matching, merging, or mastering step, additional provenance information is tracked.

The set of provenance information is not customizable.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"

Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.

Property-level provenance information is also tracked.

In a matching, merging, or mastering step, additional provenance information is tracked.

The set of provenance information is not customizable.

Regardless of the value of provenanceGranularityLevel,

  • Every merged record contains provenance information from all its source records. If provenanceGranularityLevel is coarse or fine, the merged record also contains the provenance information for the mastering step run.
  • The mastering summary is created as part of a mastering step or a merging step, but not a matching step.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"

Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.

Property-level provenance information is also tracked.

In a matching, merging, or mastering step, additional provenance information is tracked.

The set of provenance information is not customizable.

Regardless of the value of provenanceGranularityLevel,

  • Every merged record contains provenance information from all its source records. If provenanceGranularityLevel is coarse or fine, the merged record also contains the provenance information for the mastering step run.
  • The mastering summary is created as part of a mastering step or a merging step, but not a matching step.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"
Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.
The set of document-level provenance information is not customizable.
The set of property-level provenance information is customizable for custom steps.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"
Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.
The set of document-level provenance information is not customizable.
The set of property-level provenance information is customizable for custom steps.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"
Document-level provenance information is tracked.

Provenance information for the current flow or step is not stored.

CAUTION: Do not turn off provenance unless you are certain the project will never make use of provenance information.
The set of document-level provenance information is not customizable.
The set of property-level provenance information is customizable for custom steps.
"provenanceGranularityLevel" : "coarse" "provenanceGranularityLevel" : "fine" "provenanceGranularityLevel" : "off"

Only a Jobs document is created. No other provenance or lineage information is tracked.

In your custom step module, you can add code to generate document-level and property-level provenance. See Editing a Custom Step Module.

Even if provenance tracking is turned off for the flow or the step, previously collected provenance information is retained. Database administrator privileges are required to delete existing provenance information.