Data Improvements

Hi

I have a dataset that has the following fields (as a subset):

First Name Middle Names Surname Address1 Address 2 Locality Postcode Country DOB

Julie Joan Smith 1 Smith St

We are commencing data cleansing and from a more reliable source, as part of find duplicates and clustering we found the following:

Source First Name Middle Names Surname Address1 Address 2 Locality Postcode Country DOB

Finance Julie Joan Smith 1 Smith St

Sales Julie Joan Smith 1 Smith St Geelong 3220 Aust 01/01/1975

We will request Finance to update record to have the correct data as per Sales

After each extraction and dataload, we want to be able to monitor that the data in finance is being updated and start to use this data on a dashboard to show that existing data didnt have locality, postcode etc. and as they cleanse and update the data we can show a percentage of complete data sets.

Is there an easy way from one dataload to another to monitor this and show as an output the changes in the data.

thanks

Carolyn

Comments

  • Clinton JonesClinton Jones Experian Elite

    HI @Carolyn, if I interpret your intent correctly, you want to establish a data quality rule that performs a null check on the contact record columns, say locality and postcode.

    You then want to see, every run, how many records have no locality and/or postcode and produce a list of the offending records, and then also see over time whether the records are being corrected and also whether the aggregated count is getting better or worse?

    Is that a fair summary?

  • CarolynCarolyn Contributor

    Yes this is correct

  • Clinton JonesClinton Jones Experian Elite

    @Carolyn So here is how I would do this.

    Use a validation step to harvest the results and use snapshots to persist the data

    you can optionally export the failing rows thus

    For the trends you need to make sure to check the Add Batch Timestamp Column and select multibatch when you name the snapshot data set

    every time you run this workflow it will create three snapshots in this example

    1. of the failing rows
    2. of the Group Rule Stats
    3. rule result Stats

    Depending on what you want to report on, you may prefer Groups over individual Rules

    Here is what my rules look like in groups


    for a single run my group results look thus


    By rule looks similar but is more granular

    If you look inside the snapshot though, what you will see is these stats and the timestamp of the run

    here is my result from three different runs

    now to chart these, you'll need the Pivot Marketplace custom step

    we have someupcoming enhancements coming to charting which @Adrian Westlake would likely love to show in the coming weeks.

    Hope that helps

    Clinton

Sign In or Register to comment.