Match Records
Good Day, I have two datasets can I match them and get the records that matches either by company name or vat number or email domain
Comments
-
Hello
'Matching' can mean a few different things so I would suggest looking at some different Workflow steps before deciding what you are trying to achieve. Do you want to retain the two separate sources noting the records that exist in both or merge the two sources together removing the duplicates.
Lookup step can tell you if a value in one source exists in another
Join step can return all matching records (or non-matching records) https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/get-started/create-a-workflow/#join
Harmonize Duplicates or harmonization is the process of deduplication - taking multiple records that are deemed to be for the same underlying real-world subject and producing a single resulting record. https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/harmonize-duplicates-step/overview/
0 -
@Itumeleng_sage Are you are looking to do an exact match or do you need fuzzy matching? Can you give an example of the input data and output you are expecting?
There may be several functions that may be useful for you.
- A number of functions under the Lookup category allows you to define a number of different match types including the flexibility to define a pattern match with regular expression.
- Transform Text function allows you to do some standardization on the text you are comparing to allow for different ways of matching, for example, you can Remove Noise prior to comparing two values or convert the text to a Soundex string.
For more complex use cases, the Find duplicates step that incorporates standardization and matching algorithms to group together records containing similar contact data (e.g. name, address, email, phone) may be helpful too.
0 -
@Josh Boxer thank you the input I will test it out.
@Sueann See I have two tables, CustomersA with ProductsA and CustomersB with ProductsB. Now I can need to find which custmers have productsA and ProductsB. I can only match with company name or vat number or company registration or email domain some records might not have vat number or company registration or email domain. I can rely mostly on company name but they might have different spelling
0 -
@Itumeleng_sage Here is a simple example with the Lookup step. Whether this method will work for you or not depends on the underlying quality of your data.
I have a file for Product A and another for Product B. I used a lookup step and try to check if any rows in ProductA exists in ProductB based on Company or VAT or Email.
When defining the lookup step, i created the Company, VAT and Email definitions separately so they can be evaluated one by one and the results are produced side by side. For Company and Email i used a Case-insensitive match, whereas for VAT i used an Exact match.
The results show that while i couldn't find some matches using company because there are more differences that what a case insensitive match can handle, both the VAT and Email yield matches indicating that all of the ProductsA exists in ProductsB.
0