Challenge 3 Solutions

Danny Roden
Danny Roden Administrator
edited June 5 in Solutions

So did you figure it out? What approach did you take?

Post your solution below in the comments (remember to include a screenshot 📷 or a video 🎦 walk-through of your solution so others can see as well as the .dmx file containing the function/workflow containing your solution.)


Comments

  • Danny Roden
    Danny Roden Administrator
    edited October 2020

    To access the sample solution provided for this challenge, please toggle the below:

    The approach I've taken with this challenge is summarised below.


    First, to structure the data into a consistent (and valid) date format, I built a reusable function which used the 'substring' function to take the various different bits of the input and then feed that into a 'Create date/timestamp' function. After initially applying this to my data I noticed some errors present on my results (due to the fact that I hadn't taken into account values that aren't a valid date at all - e.g. '0'). So I added an 'If then else' function to check whether the input value appeared valid or not (based on length). See here for a preview of the STRING TO DATE CONVERSION FUNCTION that I ended up with.


    Next, after converting the data, I added a Validate Step to my workflow with rules for each of the tests in the challenge outline. The summary of my logic is below:


    (a) I built another reusable function (Is not a future date) which is a simple 'Less Than' function referencing the 'current date/time (dynamic)' function. You can see a preview of it here. I then applied this to all 3x fields.

    (b) the 'Start Date - is before Close Date' function was built with a simple 'Less Than' function (see here)

    (c) and the 'DOB - in valid range' test was built using the 'Get Age' function, which I fed into a 'In Valid Range' function (see here for a preview of this function)


    Finally, I prepared an output taking data from the 'Show all rows' output of the Validate step, which filtered on just records failing any rules. I then adjusted the columns for visual presentation and found 459x records failing one or more tests. To see my final output, please check out this final screenshot (and below for the .dmx package for the solution itself).

  • Sean Leoszko
    Sean Leoszko Experian Contributor

    The first part to solve was the formatting of dates, initially I tried using math to figure out when to add a delimiter between the time periods of the date, see below:


    Though it worked, I thought of another way that was a bit more streamlined. This also converted zero's to null but depending on the use case could be converted easily to another value. This was created as a custom function so it can be used across all three columns in a workflow



    After this, I created a workflow to apply the custom function across the three date fields and validate the data:

    The sirst Transform aims at applying the formatting of date fields created above:


    The second Transform is to validate the three rules:

    -The date (for all columns) is not a future date

    -Close Dates should never be before Start Dates (added a clause to allow for null Close Dates)


    -The ages of those whose date of births are present should be between 18 and 95 years old.

    I added all these rules as new columns in a Transformation Step. I then use a validation step to validate if these three columns are true. Another way would be to include these transformations into the Validation step itself.

    I couldnt find the actual challenge 3 sample file so I created my own, so the results will vary. Below are the failing rows:

    Also, could not upload my dmx as it says file extension is not allowed


  • Sean Leoszko
    Sean Leoszko Experian Contributor

    The first part to solve was the formatting of dates:

    I converted zero's to null but depending on the use case could be converted easily to another character or value.

    After the custom function of convert these dates was created and published. The next stage was to create the workflow

    Transform 1 - Applies the Format Date function to all three date fields:

    Transform 2 - Creates the Rules for Validation - (This could have been created in the validation step to streamline it further)

    Rule 1-The date (for all columns) is not a future date


    Rule 2-Close Dates should never be before Start Dates


    Rule 3-The ages of those whose date of births are present should be between 18 and 95 years old.


    Added the Validation step at the end to show all the failed rows. The validation step only checks if these three fields are true.

    Below are the failed results from the validation step (I could not find the Challenge 3 Data so results will vary):


    Tried to upload my dmx file but I received the follow message:

    Had to zip the file in order to upload it.


  • William H. Palmer
    William H. Palmer Experian Employee

    A (very lengthy) video walkthrough is below (though I recommend watching at increased playback speed, for your own sanity's sake).

    My approach has three phases:

    1) Profile the data to understand the scope and nature of the inconsistencies

    2) Create a function that will intelligently handle the identified inconsistencies and convert the dates

    3) Use a Validate step to create the rules

    The function I used to convert the dates looked like this (explained in more detail in the video):

    I then used this function to check whether dates were futuristic:

    I then created a custom function for my rule to compare Start Date and Close Date (you'll notice that the structure is very very similar):

    And a similar one for the DoB:

    Finally, I used these functions in a workflow, and then used a Filter to isolate records for review, with a Transform to tidy the columns for readability. (The parallel stream below was used to sanity check certain rules results.)

    In the video, I explain a bit more of my thought process and the investigation that went into developing this solution!


  • Ebru Erozturk
    Ebru Erozturk Experian Employee

    My approach was;

    • Checked the lengths from profile results and seems like we have 0’s as date for Close Date and Dob


    • In the function (Convert To Date_Custom);

    First, created 'LengthOfDate' variable from of the original date.


    Secondly, created 'CalcDate' variable as;, if length is 8, used as is, if length is 7 concatenated with a leading 0, if length is not 7 or 8, nullified the information.


    Then, used 'Create date/time stamp' to create a date with the CalcDate variable that I've created, using substring to be able to get day/month/year information.

    • Added a transform step to be able to convert dates to actual dates, to be able to see the originals & calculated ones, I've created 3 additional columns names are ending with _C:
    • Add another transform step to be able to define the flags for validation rules:

    For LessThanTodayFlag; created CurrentDatetime variable and used it to check for all of the dates:

    For CorrectCloseDayFlag; checked if close date is greater than or equal to start date & if close date is null or empty:

    For AgeIn18-95RangeFlag; get the age from the calculated dob and checked if it's between 18-95:

    And in the end, I have these 3 additional columns which I used for validate step:

    • Added Validate step to check the supplied validation rules
    • Took a snapshot using failing rows in the end and created a view:
    • Didn't use the validation results, instead used the flags that I've created so we don't need to use filter. Adding the dmxd file as well: