Challenge 4 Solutions

Danny Roden
Danny Roden Administrator
edited June 5 in Solutions

So did you figure it out? What approach did you take?


You know what the results should look like, but how did you get there?

Post your solution below in the comments (remember to include a screenshot 📷 or a video 🎦 walk-through of your solution so others can see as well as the .dmx file containing the function/workflow containing your solution.)


Comments

  • Charlie Chambers
    Charlie Chambers Experian Employee

    Great challenge!

    I solved this by initially standardising the rows using "Remove noise" and "Upper case".

    To parse out the name I used Parse by Regular Expression for "COKE", "PEPSI" and "7UP".

    To parse the quantities I used Parse by Regular Expression with [0-9]+PK to match the instances like 6pk and [0-9]+X to match instances like 2x, I then used Regular Expression Replace to format these as X-pack.

    To parse the volume, I used Parse by Regular Expression with [0-9]+L to match the entries like 2L and [0-9]+ML to match entries like 500ML. I then used Regular Expression Replace on ([0-9]+)L, took the first group using $1 and added 3 0's to convert the litres into ML so that everything was uniform, then divided by 1000 and concatenated L to derive the volume.

    Each result was stored in a variable, then I concatenate them all together to get the final result 😀


  • Akshay Davis
    Akshay Davis Experian Super Contributor

    I took a slightly different approach. If you consider the only standard terms can be volume and pack sizes, then my view is that using a list of product names isn't robust.

    Based on that the approach is:

    1. Identify the volume
    2. Identify the pack size
    3. The remaining text is the product name (this allows for new product names to be picked up automatically)
    4. Standardize volume, pack size and naming then concatenate the results.

    To extract the volume and pack size, it's simply

    The product name is then the remaining text

    Then concatenate the results

    This then handles 7UP, 7 UP, 7-UP and the less common drink variants in that list, like Coke Personalised Edition


  • Danny Roden
    Danny Roden Administrator
    Interesting approach @"Akshay Davis" This is a good flexible alternative approach, however it does result in the 'special / personalised' editions of the coke product being listed as their own seperate products (thus not producing the 'clean' output). For the purposes of the demo this was an intentional red herring to be navigated, but as we all know the clients' preference in these sorts of tasks will vary all the time and if it's useful to pick out these as distinct products for the onward processing then that's a neat approach. 😀