Find Duplicates - Standardize Name and Address

Sueann See
Sueann See Experian Super Contributor
edited December 2023 in Tips and tricks

When setting up your Find Duplicates step, the match column allows you to determine which columns you want to match with certain system-defined tags for Find Duplicates. These tags are used to determine how to process these columns when Find Duplicates is run.

For Names and Addresses, you have the option to either use the parent Name and Address tags or use the child/more specific Name and Address component tags.

Note: If your have tagged your data earlier, this step will attempt to match the column based on those tags but you have the ability to change the mapping.


When you use the parent tags Name and Address instead of assigning the more specific child tags, the Find Duplicates standardization process will automatically segment the Name and Address into components for matching purpose. This is recommended when you do not have a high degree of confidence that the existing data you have is classified correctly. For example, you may want to consider whether there are chances you would have a Forename value in a Surname column or a State value in a City column. This approach is also useful when the data you have collected is a single Name or Address column.

Once you run the workflow with Find Duplicates and retain your duplicate store, you can use the Find Duplicates Workbench to view each record in its standardized form so you can see the values used by the Find duplicate step to determine if the two records are duplicates. For example:

Input data

  • Name and Address each in a single column

Match columns

  • Mapped to Name and Address Find Duplicates tags


Standardized output

Example 1: Compare Abram with Abner


  • Name segmented into Forenames, Surname. Gender is inferred from the Title (if any). So, now you can compare individual name elements for better acuracy instead of performing a generic string comparison on the Name.
  • Address segmented into multiple components. Notice that Abner has a minorstreet_number Abram does not. So, now you can compare individual address elements for better accuracy instead of performing a generic string comparison on the Address.
  • The standardization process is also capable of producing modifiers that can can correct, enhance or derive other known terms from the input. Modifiers are applied when creating blocking keys. By default, a modifier is applied where all element values are capitalized for easier comparison.
  • A useful modifier for comparing names is the Rootname modifier. For example, if you are using Forenames as a blocking key with the rootname modifier, both Abner and Abram could have the Rootname Abraham for comparison and could help identify the records as duplicates.

Example 2: Compare Abram with Abramo

  • Name segmented into Forenames, Surname.
  • Address segmented into multiple components. Notice that Abramo has sub-building number and type as well as minorstreet number but Abram does not. For Abramo, Greater London has been identified as the Province and for Abram, London as the Locality. So, now you can compare individual address elements for better accuracy instead of performing a generic string comparison.
  • A useful modifier for comparing addresses is the Derived modifier. For example, if you are using Province as a blocking key with the derived modifier, both Abram and Abramo could have the Province as Greater London for comparison even though the original record for Abram did not actually have Greater London as the Province. In this case, the Province value was actually derived by the standardization process from other information found in Abram's address and could help identify the records as duplicates.
  • Another useful modifier for comparing addresses is the StandardSpelling modifier. For example, if you are using the MinorstreetType as one of your blocking keys with the StandardSpelling modifier, RD will be converted to ROAD for comparison and could help identify the records as duplicates.
  • You could also use the StandardAbbreviation modifier for comparing addresses, the difference is that instead of converting RD to ROAD, ROAD will be converted to RD for comparison.


Hopefully this gives you a glimpse into how the standardization process helps in identifying potential duplicates.

If you have any ideas for improvements, please leave me a message.

Tagged: