Many to many lookup of matching words
I have a table which I'm treating as a lookup table.
It's format is something like.
Key Value
1 This is item 1
2 This is another item
3 Item three
However I have 1000's of key values.
I want to retrieve the key when I pass in a string to compare.
e.g. my data is in another table might be
1
another
three
However my main data table has 10's of 1000's of rows.
And I want these to respectively match key
1 (matched lookup on "1")
2 (matched lookup on "another")
3 (matched lookup on "three")
What would be the recommended approach to this please.
I've successfully achieved this for another challenge where I want match a single word that matches a single value.
Thanks
0
Answers
How would you decide on the keyword that you want to match on?
Example, if you now have one more line of data called "This" what do you want it to match to?
Data Table
1 (match to key 1)
another (match to key 2)
three (match to key 3)
This ?
Thanks for getting back to me @Sueann See
That's a good question.
In that example I may wish is to concatenate all keys into a single result or return "This" or "unknown" so that someone can take manual intervention to decide.
I'm now thinking that I might want to try split the lookup table so that I have:
Key |Value |Val1 | Val2 |Val3 | Val4
1 |This is item 1 |This | is | item | 1
2 |This is another item | This | is |another | item
3| Item three | Item | three | |
This might be tricky as values are multiple word count.
And then run some kind of test to see if any words match and give is a score along with the Key for checking.
@DTAconsulting it seems a bit unusual that your Data value is a single word and the lookup value consist of multiple words. i'm curious of what the use case is. Can you share a little bit more if you don't mind?
I managed to get this far
...but
Note: Column1 in the Lookup Table can be achieved by replacing space in the Value column with |.
"This|is|item|1" in terms of a regular expression actually means "This" or "is" or "item" or "1".
I'll ask around to see if anyone has a better idea, but hopefully this provides some inspiration.
Hi @DTAconsulting
If I understand your requirement correctly I think it's as simple as using the "Contains Match" function, but chanigng the configuration of 'input type' from "whole value" to "space separated". This is a really powerful function that does a lookup based on whatever you set the input as, in it's simplest form the "whole value" setting effectively just works like a normal lookup. However when you use the other options it can perform a lookup from multiple different bits of the cells (e.g. comma separated list components, space-separated words or regular expression definitions that separate the data as appropriate). More documentation can be found here.
See below for a preview of this in action using slightly different data:
Figure 1: Results (this is the output of using the above mentioned function)
Figure 2: Config (this is how I've configured the function in the above example)
Figure 3: Ref data (for context this is the lookup file I'm referencing in the above example)
Thanks for this suggestion @Sueann See , sometimes the best answer is simpler than my often over engineered solutions :)
The data could actually contain more than a single word. e.g. a Product Description.
The Use Case is to look up a best fit Product Category from the Product Description.
e.g. Product Description is "Trek Madone SL 7 Disc Road Bike 2021 Large"
The key word(s) here are (Road)+Bike
The category look up required is Bicycles.
Sub category might be Road Bikes
It's possible I have a sub category called Disc Brakes which fits under Bike components rather than Bicycles. Thinking along lines of GS1 hierarchy but not quite the same.
If it's indeterminate what category to select, I'd want to bring back both, either delimited or as one row per possible match. Does that make sense?
e.g. it would return a match on:
Road Bikes
and
Disc Brakes
Thank you @Danny Roden, thank you both.
Between you both and your further explanation of the nuances of both functions based on use of "Whole word" or "Space Separated", this is very neat and succinct way to achieve most of what I want. I'll need to play with the input slightly and experiment with the format of the lookup but this gets me so much further without the complex custom function I built.
Actually the "Replace Matches" is useful for another use case where I'm trying to populate a description by swapping out with abbreviation replacements, just like your job title examples.
For the use case I describe above I need to replace the whole of the data with the lookup value into a new column.
As mentioned I might have more than 1 match and therefore need to return all possible values for further manual scrutiny.
e.g.
Lookup table
Key Value
key1 This is item one
key2 This is item 2
key3 another item
key4 Rubbish data
Data
ID Desc
001 Item one
002 item 2
003 another
004 Something else
Returns
ID Desc DescLookup
001 Item one key1, key2 Matched on "Item" and "one"
002 item 2 key1, key2 Matched on "Item" and "2"
003 another key3 Matched on "another"
004 Something else no match
OR
Returns
ID Desc DescLookup
001 Item one key1 Matched on "Item" and "one"
001 Item one key2 Matched on "Item" and "one"
002 item 2 key1 Matched on "Item" and "2"
002 item 2 key2 Matched on "2"
003 another key3 Matched on "another"
004 Something else no match
@DTAconsulting In order for the system to return all the matching keys/values, the only way i found is to convert both the Data and Lookup datasets into keyword lists, then do an All Values lookup.
Data Keywords
Lookup keywords
Workflow
Lookup Definition
Results
You can then further group this results by ID and Desc to get a holistic view on the matching keys and keywords.
In order to reduce false positive matches, I think you can try to reduce the keywords list for the Lookup dataset at least to make it a more meaningful lookup. When you are exploring the data, profiling and drilling down to the possible values of the column will help determine these commonly occurring keywords.