Provenance for Mastering

Mastering Summary

In a mastering step or in a merging step, the mastering summary is generated with the following information:

  • Job ID
  • Number of matches found
  • Number of merged records created
  • List of entity properties that matched between any two records
  • List of matching algorithms that resulted in a match between any two records

The mastering summary is stored in the data-hub-JOBS database with the collection tag JobReport.

An example query to retrieve the mastering summary for a specific job:

   cts.search(cts.andQuery([
    cts.collectionQuery('JobReport'),
    cts.jsonPropertyValueQuery('jobID', '<jobID>')
  ]))
Note: The mastering summary is generated regardless of the provenanceGranularityLevel setting; however, the example query above returns an empty list if provenanceGranularityLevel is set to off.

Provenance with Fine Granularity

If provenanceGranularityLevel is set to fine, additional information is tracked.

Match

For each record that matches one or more other records, the following provenance information is tracked:

  • Job ID
  • Identifiers of the record that was in focus and the records that matched it; i.e., the URI and properties marked as primary keys
  • Merge threshold score
  • Total score
  • For each matching property,
    • The property that matched
    • The values that matched
    • The score for the match
    • The matching algorithm that was triggered (e.g., Exact, Synonym, Zip)

Merge

For each merge, the following provenance information is tracked:

  • Paths to the original records that provided the value(s) for each merged property