Many to many lookup of matching words

DTAconsultingDTAconsulting Learner
edited April 7 in General discussion

I have a table which I'm treating as a lookup table.

It's format is something like.


Key Value

1 This is item 1

2 This is another item

3 Item three


However I have 1000's of key values.

I want to retrieve the key when I pass in a string to compare.

e.g. my data is in another table might be

1

another

three


However my main data table has 10's of 1000's of rows.


And I want these to respectively match key

1 (matched lookup on "1")

2 (matched lookup on "another")

3 (matched lookup on "three")


What would be the recommended approach to this please.


I've successfully achieved this for another challenge where I want match a single word that matches a single value.

Thanks

Tagged:

Answers

  • Sueann SeeSueann See Administrator

    How would you decide on the keyword that you want to match on?

    Example, if you now have one more line of data called "This" what do you want it to match to?

    Data Table

    1 (match to key 1)

    another (match to key 2)

    three (match to key 3)

    This ?

  • Thanks for getting back to me @Sueann See

    That's a good question.

    In that example I may wish is to concatenate all keys into a single result or return "This" or "unknown" so that someone can take manual intervention to decide.


    I'm now thinking that I might want to try split the lookup table so that I have:

    Key |Value |Val1 | Val2 |Val3 | Val4

    1 |This is item 1 |This | is | item | 1

    2 |This is another item | This | is |another | item

    3| Item three | Item | three | |

    This might be tricky as values are multiple word count.

    And then run some kind of test to see if any words match and give is a score along with the Key for checking.

  • Sueann SeeSueann See Administrator

    @DTAconsulting it seems a bit unusual that your Data value is a single word and the lookup value consist of multiple words. i'm curious of what the use case is. Can you share a little bit more if you don't mind?

    I managed to get this far

    ...but

    • more work is needed to handle case insensitivity (if required)
    • for those with multiple matches, I only managed to return one of the matching key values

    Note: Column1 in the Lookup Table can be achieved by replacing space in the Value column with |.

    "This|is|item|1" in terms of a regular expression actually means "This" or "is" or "item" or "1".


    I'll ask around to see if anyone has a better idea, but hopefully this provides some inspiration.

  • Danny RodenDanny Roden Administrator

    Hi @DTAconsulting

    If I understand your requirement correctly I think it's as simple as using the "Contains Match" function, but chanigng the configuration of 'input type' from "whole value" to "space separated". This is a really powerful function that does a lookup based on whatever you set the input as, in it's simplest form the "whole value" setting effectively just works like a normal lookup. However when you use the other options it can perform a lookup from multiple different bits of the cells (e.g. comma separated list components, space-separated words or regular expression definitions that separate the data as appropriate). More documentation can be found here.

    See below for a preview of this in action using slightly different data:

    Figure 1: Results (this is the output of using the above mentioned function)


    Figure 2: Config (this is how I've configured the function in the above example)


    Figure 3: Ref data (for context this is the lookup file I'm referencing in the above example)


  • DTAconsultingDTAconsulting Learner
    edited April 8

    Thanks for this suggestion @Sueann See , sometimes the best answer is simpler than my often over engineered solutions :)


    The data could actually contain more than a single word. e.g. a Product Description.


    The Use Case is to look up a best fit Product Category from the Product Description.

    e.g. Product Description is "Trek Madone SL 7 Disc Road Bike 2021 Large"

    The key word(s) here are (Road)+Bike


    The category look up required is Bicycles.

    Sub category might be Road Bikes

    It's possible I have a sub category called Disc Brakes which fits under Bike components rather than Bicycles. Thinking along lines of GS1 hierarchy but not quite the same.


    If it's indeterminate what category to select, I'd want to bring back both, either delimited or as one row per possible match. Does that make sense?

    e.g. it would return a match on:

    Road Bikes

    and

    Disc Brakes

  • Thank you @Danny Roden, thank you both.

    Between you both and your further explanation of the nuances of both functions based on use of "Whole word" or "Space Separated", this is very neat and succinct way to achieve most of what I want. I'll need to play with the input slightly and experiment with the format of the lookup but this gets me so much further without the complex custom function I built.

  • Actually the "Replace Matches" is useful for another use case where I'm trying to populate a description by swapping out with abbreviation replacements, just like your job title examples.

    For the use case I describe above I need to replace the whole of the data with the lookup value into a new column.

    As mentioned I might have more than 1 match and therefore need to return all possible values for further manual scrutiny.

    e.g.

    Lookup table

    Key Value

    key1 This is item one

    key2 This is item 2

    key3 another item

    key4 Rubbish data


    Data

    ID Desc

    001 Item one

    002 item 2

    003 another

    004 Something else


    Returns

    ID Desc DescLookup

    001 Item one key1, key2 Matched on "Item" and "one"

    002 item 2 key1, key2 Matched on "Item" and "2"

    003 another key3 Matched on "another"

    004 Something else no match


    OR


    Returns

    ID Desc DescLookup

    001 Item one key1 Matched on "Item" and "one"

    001 Item one key2 Matched on "Item" and "one"

    002 item 2 key1 Matched on "Item" and "2"

    002 item 2 key2 Matched on "2"

    003 another key3 Matched on "another"

    004 Something else no match

  • Sueann SeeSueann See Administrator

    @DTAconsulting In order for the system to return all the matching keys/values, the only way i found is to convert both the Data and Lookup datasets into keyword lists, then do an All Values lookup.

    Data Keywords

    Lookup keywords

    Workflow

    Lookup Definition


    Results


    You can then further group this results by ID and Desc to get a holistic view on the matching keys and keywords.

    In order to reduce false positive matches, I think you can try to reduce the keywords list for the Lookup dataset at least to make it a more meaningful lookup. When you are exploring the data, profiling and drilling down to the possible values of the column will help determine these commonly occurring keywords.

Sign In or Register to comment.