Match Records

Itumeleng_sage
edited December 2023 in General

Good Day, I have two datasets can I match them and get the records that matches either by company name or vat number or email domain

Comments

  • Josh Boxer
    Josh Boxer Administrator

    Hello

    'Matching' can mean a few different things so I would suggest looking at some different Workflow steps before deciding what you are trying to achieve. Do you want to retain the two separate sources noting the records that exist in both or merge the two sources together removing the duplicates.

    Lookup step can tell you if a value in one source exists in another

    Join step can return all matching records (or non-matching records) https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/get-started/create-a-workflow/#join

    Harmonize Duplicates or harmonization is the process of deduplication - taking multiple records that are deemed to be for the same underlying real-world subject and producing a single resulting record. https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/harmonize-duplicates-step/overview/

  • Sueann See
    Sueann See Experian Super Contributor

    @Itumeleng_sage Are you are looking to do an exact match or do you need fuzzy matching? Can you give an example of the input data and output you are expecting?


    There may be several functions that may be useful for you.

    • A number of functions under the Lookup category allows you to define a number of different match types including the flexibility to define a pattern match with regular expression.
    • Transform Text function allows you to do some standardization on the text you are comparing to allow for different ways of matching, for example, you can Remove Noise prior to comparing two values or convert the text to a Soundex string.


    For more complex use cases, the Find duplicates step that incorporates standardization and matching algorithms to group together records containing similar contact data (e.g. name, address, email, phone) may be helpful too.

  • @Josh Boxer thank you the input I will test it out.

    @Sueann See I have two tables, CustomersA with ProductsA and CustomersB with ProductsB. Now I can need to find which custmers have productsA and ProductsB. I can only match with company name or vat number or company registration or email domain some records might not have vat number or company registration or email domain. I can rely mostly on company name but they might have different spelling

  • Sueann See
    Sueann See Experian Super Contributor

    @Itumeleng_sage Here is a simple example with the Lookup step. Whether this method will work for you or not depends on the underlying quality of your data.

    I have a file for Product A and another for Product B. I used a lookup step and try to check if any rows in ProductA exists in ProductB based on Company or VAT or Email.



    When defining the lookup step, i created the Company, VAT and Email definitions separately so they can be evaluated one by one and the results are produced side by side. For Company and Email i used a Case-insensitive match, whereas for VAT i used an Exact match.


    The results show that while i couldn't find some matches using company because there are more differences that what a case insensitive match can handle, both the VAT and Email yield matches indicating that all of the ProductsA exists in ProductsB.