Data Discrepancy Comparison Post-Migration

Marco_13112001 · May 2024

Does Aperture have an functionality that compares two datasets and highlights any discrepancies between them on a row-by-row and column-by-column basis. I've tried the compare datasets step however it does not highlights the row/column. The scenario is that data has been migrated from an old system to a new system, and as part of our UAT we need to reconcile the new system data with the transformed data provided for migration.

Specifically, the function should:

Take two datasets as input: the original dataset and the migrated dataset.
Compare each row and each column of the two datasets.
Identify and highlight discrepancies, such as differences in cell values.
Provide a summary of discrepancies, indicating which rows and columns have mismatches.

Many thank in anticipation

Best regards,

Marco

Josh Boxer · May 2024

Which version of Data Studio are you using? Unclear why the Compare step is not helpful here. It does highlight discrepancies per row for any/all of the selected columns, also providing summary statistics of where the changes have occurred.

https://docs.experianaperture.io/data-quality/hosted-aperture-data-studio/data-studio-objects/workflows/#compare~workflow-steps

https://community.experianaperture.io/discussion/1257/compare-two-data-sources

Josh Boxer · May 2024

2.13 has the detailed output:
https://community.experianaperture.io/discussion/1216/
2.13.4 has additional summary statistics
: https://community.experianaperture.io/discussion/1247/

Marco_13112001 · May 2024

@Josh Boxer Thanks for the response. We are now at version 2.12.12.90 which is the reason why I cannot get the results.

Marco_13112001 · May 2024

@Josh Boxer We have upgraded to the latest version and now I can use the the compare step as suggested. However, I'm having issues to setup the step and I would like your help to clarify some points. Is my understating is that the key should be an identification value common in both datasets i.e., we are using UPRNS as the key, but I getting the results as duplication and the record is removed from the analysis. Is my understanding wrong?

Josh Boxer · May 2024

The key is the identifier of the row/record to be compared. It should appear only once in each source.

Marco_13112001 · May 2024

@Josh Boxer If the key is repeated within the datasets the compare step will not work? the reason for my question is that the source and target datasets contains multiple records related with the same UPRN.

Josh Boxer · May 2024

If these rows are identical then deduplicate (Group) the data before comparing. If they are not identical then how would you check that the rows are the same, i.e. that UPRN1 in datasetA is being compared to the correct UPRN1 row in datasetB?

Data Discrepancy Comparison Post-Migration

Best Answers

Answers

Categories