Challenge 3 - It's a Date!

Danny Roden
Danny Roden Administrator
edited June 5 in Challenges

This latest challenge is all about dates. The sample dataset highlights a common challenge many of us are familiar with, which is around non-standard date formats (most often seen when working with data exports from mainframe systems or proprietary fixed width file formats).

Have you ever tried running analytics on dates that aren’t formatted properly? Or sorting them only to find that all the ‘1st of the months’ end up being bundled together? It can be a real problem, particularly if data quality isn’t considered from the beginning.

As you will see from the below screenshot the data for this challenge is presented as an alphanumeric string; which makes it almost impossible to use for effective sorting, logic based decisions and rule evaluation.

The challenge is 3x fold:

  1. Build and then apply a function to convert the values in the 3x input columns into a ‘date’ format that can then be used later.
  2. Build rules to test if the data is fit for purpose. Data is fit for purpose if:
    1. The date (for all columns) is not a future date
    2. Close Dates should never be before Start Dates
    3. The ages of those whose date of births are present should be between 18 and 95 years old.
  3. Finally generate a view of data that contains all of the records failing each of the rules to be used for manual remediation work.

Note: If you get stuck along the way, a sample solution is available here which you can reference to get back on track.

When you've figured it out, please post your solution here.

Your solution should include a screenshot (or a video walk-through) of your results. Please also include the .dmx file containing your solution and a brief description of the approach you took.

Comments

  • Sean Leoszko
    Sean Leoszko Experian Contributor

    Is there a sample file of these dates that I can use?

  • Danny Roden
    Danny Roden Administrator

    Good spot @Sean Leoszko. Looks like it didn't upload properly first time around, however it should now be there 😊

  • Sai
    Sai Experian Employee
    edited November 2020

    Step 1: Noticed the length of the dates in all three rows were inconsistent. Created function to convert aphanumberic data type to Date.

    List of Reusable functions created:


    Function 1 - New Date:

    Function 2: Date Conversion:


    Transformation to create a New Column (with Date Format): Example shown below : DoB



    Applied these reusable functions to convert Start Date and Close Date (Alphanumeric data types) to Date formats.

    Step 2: Rules to check if the data is fit:

    a.      Created a reusable function to check if the date is a future date:

    Function 3: Valid Date


    Applied this function in transform step for Start Date, Close Date and DoB to check the valid Date Status.

    b.      Built a logic with Multi compare function to check if Close Date is before the Start Date


    c.      Built a logic to identify the rows with age between 18 and 95 (including) for Valid/correct DoBs.


    Step 3: Used a split function to distribute passed/ failing Rows. Finally created a report with only failing rows not satisfying above conditions.