Challenge 1 Solutions

Roden, Danny · July 2020

So did you figure it out?

How many records did you find with inconsistencies between Gender and Title?

Post your solution below in the comments (remember to include a screenshot 📷 or a video 🎦 walk-through of your solution so others can see as well as the .dmx file containing the function/workflow containing your solution.)

Roden, Danny · July 2020

Sample Solution

To access the sample solution provided for this challenge, please toggle the below:

My approach to this challenge was to build a function (see here for a high level overview) that classifies gender-specific titles and then feeds that into a decision to find the mismatches. The approach I took, identified a total of 408 records with mismatches (output summary image here).

However, I did notice (through profiling the combinations of title and gender), that there were some further discrepancies where the title was unknown but the gender was specified (and vice-versa)

See below for a copy of my workflow and function.

Week 1 - Solution.dmx

Josh Boxer · October 2020

1. After importing the data I used a Group step, by title.

2. Then exported the results of this step to create a second CSV to which I manually added a GenderMatch column with the logic Mr = M etc.

(I'm sure it is possible to replace this step and create the lookup table within the workflow.)

3. I amended the Group step, all columns, to remove any duplicates, leaving ~60k of the original ~99k rows.

4. Used a Lookup step to add a new column mapping the correct Gender for each title based on the second CSV.

5. Filter step to remove all 'U' rows (those where Title is Dr, null or UN)

6. Transform step to add a True/False column comparing source gender with lookup gender.

7. Split step to get all of the False rows where the original gender is not as expected.

There are 2,154 rows where the title does not match the expected gender (including Gender is null or 'U', but Title is known)

Data Studio Export - Challenges - 01.dmx

Simms, Henry · October 2020

This is the solution we came up with in our breakout team at the Knowledge Cafe session. Thanks @Candice Tsang @YvonneSaulnier, @Romiel Noumbissi, @Jacky Lu and Ashley.

First we explored the example data View, grouping on Gender (to see the unique genders) and the Title (to see unique genders). This showed us the sort of problems we'd need to handle in the gender mismatch logic.

We considered two options:

Using a domain with valid title / gender combinations, and a function that does a lookup
Hard-coding values into a function that checks the combinations

In the interests of time, and because we thought it was unlikely that we'd encounter new gender and title values in later updates of the data, we went for the hardcoded option.

We decided to create a saved function called "Check gender and title are not mismatch", which would return true (if no mismatch) or false (for mismatch). We'd then have the ability to use this function as a rule in a Validate step to easily explore failing rows and check metrics for rule failure.

First we created some parameters and test values:

Then we created 3 conditions:

If the gender is empty (null or whitespace only) or U, there will be no mismatch (Equals check)
If the gender is M, the title cannot be "Mrs", "Miss", or "Ms". If there is, it's a mismatch (2 Equals checks, And'd together)
If the gender is F, the title cannot be "Mr". If there is, it's a mismatch (2 Equals checks, And'd together)

We fed these 3 conditions into an If Then Else. Note that I've used variables for each of the 3 output conditions above for readability of the function: