Fuzzy Matching logic

Clinton JonesClinton Jones Experian Super Contributor

How does the fuzzy matching in the Find Duplicates step work?

Comments

  • Akshay DavisAkshay Davis Administrator

    The rules in Find duplicates allow you to build up detailed match classification rules based on individual components. Each of these components, like Forename or Street Name can be configured to allow acceptable differences to be classified in one of the four match levels.

    These individual comparison functions can be found in the rules documentation, they include standard comparisons like Levenshtein or Jaro Winkler edit distances (essentially the number of character differences between two strings), to specific comparison functions for elements like postcodes where we want to apply additional logic.

    These are combined into overall rules, with a pseudo example shown below

    • A name is a candidate for manual review if
      • The forenames are different be 3 characters
        • OR
        • The root name of the forenames (i.e. John -> Johnathan and Jon -> Jonathan) are the same
      • AND
      • The surname has an edit distance of 90% or higher

    These can be layered up into the top four match levels to provide as much control as needed.

Sign In or Register to comment.