When should I use a Match Lookup rather than a join
I have a large amount of data (hundreds of millions of transactional records) that I need to match up against a list of master data records by name (less than 100,000) - which would be better, a match lookup or a join?
Some of these records might constitute exact matches.
Some of these records might have case differences i.e. John Smith vs. john smith
I don't have a real fixed key that I can guarantee will make a join, but I might actually have missing or unmatchable transactional records.
Best Answer
-
Hi Clinton,
The way I would tackle this would be to build a 'matchkey' field on each dataset using a Transform step (e.g. concatenate the 2x fields and then standardise the values however you want, e.g. remove noise and upper case) before then doing a join to pull through all the additional columns from the Account table onto the Transactional table.
However you can also achieve this by simply using a Join step and selecting both the FirstName and LastName fields as the keys and then selecting to do a case-insensitive match:
The benefits of the first approach is that it will find more matches if your data contains variations outside of just casing (e.g. Paddy O'Neil vs PADDY ONEIL).
Best bet is to profile the data first to ensure you know how it varies and what the impact is of matching using each approach, and then build (and iterate) a workflow that is right for your data.
5
Answers
-
-
Hi Clinton, what's the relationship between these datasets and what are you hoping to achieve?
Are you looking to do a full-join to return all the different combinations, or to simply bring flag records that match, or to return a list of attributes from one dataset onto the other?
0 -
The transactional data contains transactional events - basically shopping cart transactions that were made with contact data cut not tied to a 'known account'
GUID, FIRSTNAME, LASTNAME, SHIPADDRESS(multiline/unstructured), EMAIL, PHONE, PRODUCT, DATE, QTY, PRICEPER, ORDERVALUE
The known accounts are my master records
ACCOUNTID, TITLE, FIRSTNAME, LASTNAME, BILLADDRESS1, BILLADDRESS2, BILL_CITY, BILL_ZIP, BILL_COUNTRY, PHONE, EMAIL
0 -
@Clinton did this answer your question?
0 -
Is it worth creating hash-keys to match in Aperture? (seemed to make a big difference when matching strings in Pandora)
1 -
My view is that hash keys can be useful.
1