Configure Matching Using Hub Central
Matching is defined by two types of objects.
- A match threshold specifies what action is triggered when the total match weight (degree of similarity) is within a defined range.
- A match ruleset defines the criteria for what is considered a match.
Before you begin
- Security role: Hub Central Developer or any role that inherits it. Learn more: Users and Roles
- Go to the Curate area of Hub Central.
- Go to your Hub Central endpoint.
Note: Disregard this step if you are working from an on-prem environment. See Step 1b.
- In the icon bar, click the Curate icon ().
- Go to your Hub Central endpoint.
- In the list, expand the entity type of the step to edit.
- Click the pencil icon () for the step.
- Define the thresholds when actions would occur.
Under Configure your thresholds:
- To create a new threshold, click
Name Description Name The threshold rule name. Action What to do if the total weight exceeds the Weight threshold.
- Merge. Automatically merges the candidate records, according to the merging rules.
- Notify. Sends a notification for a human to review the match and decide on the action to take.
- Custom. Performs actions defined in a custom module.
- To position thresholds, click the Enable Threshold Scale toggle switch. Then move your threshold along the continuum from low to high to indicate the degree of matching that would trigger the selected action.
- A threshold at the LOW end would trigger the selected action if the compared entities match only slightly.
- A threshold at the HIGH end would trigger the selected action if the compared entities match with a high degree of certainty.
- To create a new threshold, click .
- Create and prioritize your rulesets.
Under Place rulesets on a match scale:
- To create a new ruleset, click .
- To compare the values of a single property, select Add ruleset for a single property.
Name Description Property to Match The property whose values to compare. Match Type The matching type: Exact, Synonym, Double Metaphone, Zip, Reduce, and Custom.
- Exact. Determines if the values of the specified entity property in two or more records are exactly the same.
- Synonym. Determines if the values of the specified entity property in two or more records are synonyms, according to the specified thesaurus.
- Double Metaphone. Determines if the values of the specified entity property in two or more records sound similar, based on the Double Metaphone algorithm. For example, "Smith" might sound like "Schmidt".
- Zip. Determines if the zip/postal code in two or more records match.
- Reduce. Reduces the significance of certain matches. For example, even if the addresses and last names of two records match, the similarity might not necessarily indicate that the two records refer to the same person, because they might be two members of the same family.
- Custom. Runs a function in your custom module to compare the values of a specified entity property in two or more records.
Additional settings for the Synonym match type Thesaurus URI The location of the thesaurus that is stored in a MarkLogic Server database and used to determine synonyms. Learn more: Managing Thesaurus Documents Filter A node in the thesaurus to use as a filter. For example,
Learn more: the $filter parameter in thsr:expand.
Additional settings for the Double Metaphone match type Dictionary URI The location of the phonetic dictionary that is stored in a database and used when comparing words phonetically. Learn more: Custom Dictionaries Distance Threshold The threshold below which the phonetic difference (distance) between two strings is considered insignificant; i.e., the strings are similar to each other. Learn more: spell functions Collation The URI to the collation to use. A collation specifies the order for sorting strings. Learn more: Encodings and Collations Additional settings for the Custom match type URI The location of the custom module. Function The name of the custom function within the custom module. Namespace
- To position rulesets, click the Enable Rulesets Scale toggle switch. Then move your ruleset along the continuum from low to high to indicate its importance or weight.
Note: If Reduce Weight is enabled for a ruleset, the total weight of all other rulesets is reduced by the relative (between LOW and HIGH) weight of that ruleset.
- A ruleset at the LOW end indicates that the ruleset has a low impact in determining whether the records match.
- A ruleset at the HIGH end indicates that the ruleset has a high impact in determining whether the records match.
What to do next
- Add the step to a new flow or an existing one.
Tip: You can add the step to multiple flows.
- Hover over the step tile.
- Click Add step to a new flow or select an existing flow under Add step to an existing flow.
- In the Run area, expand the flow and run the step.