Configure Matching Using Hub Central

About Matching

Matching is defined by two types of objects.

  • A match threshold specifies what action is triggered when the total match weight (degree of similarity) is within a defined range.
  • A match ruleset defines the criteria for what is considered a match.

Before you begin

You need:

  • Security role: Hub Central Developer or any role that inherits it. Learn more: Users and Roles

Procedure

  1. Go to the Curate area of Hub Central.
    Learn how.
    1. Go to your Hub Central endpoint.
      Note: Disregard this step if you are working from an on-prem environment. See Step 1b.
    2. In the icon bar, click the Curate icon ().
      Hub Central - icon bar - Curate

  2. In the list, expand the entity type of the step to edit.
  3. Click the pencil icon () for the step.

    Hub Central - Matching step - pencil icon

Match Thresholds
  1. Define the thresholds when actions would occur.

    Hub Central - Matching step - match configuration - thresholds

    Under Configure your thresholds:
    1. To create a new threshold, click Add.

      Add Match Threshold - Custom action

      Name Description
      Name The threshold rule name.
      Action What to do if the total weight exceeds the Weight threshold.
      • Merge. Automatically merges the candidate records, according to the merging rules.
      • Notify. Sends a notification for a human to review the match and decide on the action to take.
      • Custom. Performs actions defined in a custom module.
      Additional settings for the Custom action
      URI The location of the custom module.
      Function The name of the custom function within the custom module.
      Namespace The namespace of the library module where the custom function is. Blank, if the custom function is JavaScript code.
    2. To position thresholds, click the Enable Threshold Scale toggle switch. Then move your threshold along the continuum from low to high to indicate the degree of matching that would trigger the selected action.
      • A threshold at the LOW end would trigger the selected action if the compared entities match only slightly.
      • A threshold at the HIGH end would trigger the selected action if the compared entities match with a high degree of certainty.
Match Rulesets
  1. Create and prioritize your rulesets.

    Hub Central - Matching step - match configuration - rulesets

    Under Place rulesets on a match scale:
    1. To create a new ruleset, click Add.
    2. To compare the values of a single property, select Add ruleset for a single property.

      Add Match Ruleset for a Single Property

      Name Description
      Property to Match The property values to compare.
      Match Type The matching type: Exact, Synonym, Double Metaphone, Zip, Reduce, and Custom.
      • Exact. Determines if the values of the specified entity property in two or more records are exactly the same.
      • Synonym. Determines if the values of the specified entity property in two or more records are synonyms, according to the specified thesaurus.
      • Double Metaphone. Determines if the values of the specified entity property in two or more records sound similar, based on the Double Metaphone algorithm. For example, "Smith" might sound like "Schmidt".
      • Zip. Determines if the zip/postal code in two or more records match.
      • Reduce. Reduces the significance of certain matches. For example, even if the addresses and last names of two records match, the similarity might not necessarily indicate that the two records refer to the same person, because they might be two members of the same family.
      • Custom. Runs a function in your custom module to compare the values of a specified entity property in two or more records.
      Values to Ignore The documents with values the match ruleset ignores when matching.
      Fuzzy Match A property that matches values that sound alike, also known as double metaphones.
      Additional settings for the Synonym match type
      Thesaurus URI The location of the thesaurus that is stored in a MarkLogic Server database and used to determine synonyms. Learn more: Managing Thesaurus Documents
      Filter A node in the thesaurus to use as a filter. For example, <thsr:qualifier>birds</thsr:qualifier>.

      Learn more: the $filter parameter in thsr:expand.

      Additional settings for the Double Metaphone match type
      Dictionary URI The location of the phonetic dictionary that is stored in a database and used when comparing words phonetically. Learn more: Custom Dictionaries
      Distance Threshold The threshold below which the phonetic difference (distance) between two strings is considered insignificant; i.e., the strings are similar to each other. Learn more: spell functions
      Collation The URI to the collation to use. A collation specifies the order for sorting strings. Learn more: Encodings and Collations
      Additional settings for the Custom match type
      URI The location of the custom module.
      Function The name of the custom function within the custom module.
      Namespace The namespace of the library module where the custom function is. Blank, if the custom function is JavaScript code.
    3. To position rulesets, click the Enable Rulesets Scale toggle switch. Then move your ruleset along the continuum from low to high to indicate its importance or weight.
      • A ruleset at the LOW end indicates that the ruleset has a low impact in determining whether the records match.
      • A ruleset at the HIGH end indicates that the ruleset has a high impact in determining whether the records match.
      Note: If Reduce Weight is enabled for a ruleset, the total weight of all other rulesets is reduced by the relative (between LOW and HIGH) weight of that ruleset.

What to do next

  1. Add the step to a new flow or an existing one.
    1. Hover over the step tile.
    2. Click Add step to a new flow or select an existing flow under Add step to an existing flow.
    Tip: You can add the step to multiple flows.
  2. In the Run area, expand the flow and run the step.