Provenance for Mastering
Mastering Summary
In a mastering step or in a merging step, the mastering summary is generated with the following information:
- Job ID
- Number of matches found
- Number of merged records created
- List of entity properties that matched between any two records
- List of matching algorithms that resulted in a match between any two records
The mastering summary is stored in the data-hub-JOBS
database with the collection tag JobReport
.
An example query to retrieve the mastering summary for a specific job:
cts.search(cts.andQuery([
cts.collectionQuery('JobReport'),
cts.jsonPropertyValueQuery('jobID', '<jobID>')
]))
Note: The mastering summary is generated regardless of the
provenanceGranularityLevel
setting; however, the example query above returns an empty list if provenanceGranularityLevel
is set to off
.Provenance with Fine Granularity
If provenanceGranularityLevel
is set to fine
, additional information is tracked.
Match
For each record that matches one or more other records, the following provenance information is tracked:
- Job ID
- Identifiers of the record that was in focus and the records that matched it; i.e., the URI and properties marked as primary keys
- Merge threshold score
- Total score
- For each matching property,
- The property that matched
- The values that matched
- The score for the match
- The matching algorithm that was triggered (e.g., Exact, Synonym, Zip)
Merge
For each merge, the following provenance information is tracked:
- Paths to the original records that provided the value(s) for each merged property