Adding a variable comment as a "call to action" for data quality issues found

Sueann SeeSueann See Experian Contributor
edited April 30 in General discussion

This is an experiment on handling HR data quality issues using a validation workflow in Aperture Data Studio v2.X.

I built some simple HR data quality rules for FTE, Gender and Employment Category. With a validation step, i am able to obtain the following results for each of the rules.

I then want to add a comment as a "call to action" for my respective regional teams to fix the data elements that has failed the quality check. The problem is the number of data elements that can fail ranges from a minimum of 0 to a maximum of 3. How can i create a variable comment based on the validation results?

I created a re-usable function to evaluate if each of the rules have failed, assigning a name for the data element if rule has failed , compiled them into a list, trim the trailing comma, replace multiple commas with a single comma, and trim the leading comma.

The trimming and replacing of commas is necessary because we never know how many data elements have failed and we may end up creating a partial or empty list:

,, (if there are no failing elements)

FTE,, (if only FTE check failed)

,Gender, (if only Gender check failed)

,,Employment Category (if only Employment Category failed)

Then i added a comment column with a transformation function that counts the number of failed data elements from this function, and formed the comment accordingly. If there were no failed data elements, the comment would be "No action required". Otherwise, the comment would include the list of failed data elements. If there is an action to be taken, i would also like to show this as a warning.

I finally managed to get the output comment that i desire:

I am wondering if there is a more elegant/efficient way of handling this?

Comments

  • Henry SimmsHenry Simms Administrator
    edited May 5

    Hi @Sueann See

    I've been looking at this but didn't actually find much scope for improvement in the overall approach. There are a few tweaks you could make to streamline both functions and improve maintainability / readability

    In the function where you construct the list of comments based on the validation results, you could avoid the two separate Trim functions and the Regular expression replace (using regex should always be a last resort) by first sorting the list (bring any empty values to the front) and then trimming:

    Because Trim replaces multiple instances of the trim char, you'll remove all the unwanted commas in one go. Obviously this will only work if the order in which the comments appear isn't important.

    It seems like a new "List-Remove blanks" function could be a useful addition to Data Studio, possibly worth raising a request for?

    The second function, to add the comment column, could be improved in two ways:

    1. Use a variable for the "Failed Data Elements" result list, rather than having to use the function twice. This improves the readability
    2. Use Is Empty in place of the list length + check if 0

    You end up with:

    Henry

Sign In or Register to comment.