Address Elements, how does the Duplicate Store determine which element a field is?

Hi all,

Does anyone know where I can find some detailed documentation on how the duplicate store determines which elements are applicable to a text entry in an address field. I'm fine tuning our rules sets at the moment and know how this step of the process works will be super helpful so that I don't inadvertently cause any other output changes.

An example to help clarify, I have two addresses (both happen to be schools) which in our model should match under the "CompanyAddressCriteria" that's based on the standard rules. This match requires the MINORSTREETTHEME exact match, however this fails on the street "Daniells" which is the validated street. Another school passes this check though, it has the street "Balgowan Road". From looking at the rules visualisation I can see that it is specifically the MINORSTREET_DESCRIPTION.EXACT that fails, which makes me wonder how the MINORSTREET_DESCRIPTION itself is determined. At the moment I am considering an OR statement to pass this condition if this element is absent on both records, but again I'd like to know more before I implement this.

I would also love to know how the other elements are determined too, so any guidance would be fantastic.

Thanks,

Ben

Comments

  • Mirjam Schuke
    Mirjam Schuke Administrator

    Hi Ben

    First of all, there is the documentation page that should give you more details about the set up. Data Quality user documentation | Advanced configuration

    Just to summarise, the Duplicate Store uses a combination of tagging, standardization, and parsing logic to determine which address components (like MINORSTREET_DESCRIPTION) are extracted and used for matching. I assume you have tagged your data before using the Find Duplicates step?

    The step then applies standardization algorithms to normalize address data, breaking down address strings into granular components like Street Name, Minor Street Theme, etc. and then uses internal parsing rules to identify elements like MINORSTREET_DESCRIPTION. These rules are based on expected address formats. In this case “Daniells” was not parsed or validated as a MINORSTREET_DESCRIPTION because it didn’t recognise it as such, while “Balgowan Road” likely has a clearer structure or is better recognized as a street by the validation engine.

    Your suggested workaround runs a risk of introducing false positives because there are other more valid reasons why the MINORSTREET_DESCRIPTION might be empty.

    If possible, a better workaround would be to somehow force those specific problematic records into the correct elements, perhaps by adding some more detailed address data to the input if it contains that street name.

    It might also help with debugging in this instance to know what "Daniells" is being classed as if not a street name. Run the address through the Validate Addresses step and inspect the parsed output fields. 

    FYI, a new version of our matching capabilities is coming out soon where you're able to make manual adjustments to edge cases, if that's of interest to you. Here are some recent videos what we're building.

    Manual match cluster refinement - Update — Experian Data Quality Community