Document Data Lineage in my Aperture installation
I have recently joined a company using Aperture to manage their data quality.
They have set up several Spaces, each of which have some of their own Datasets and some views shared from a central space.
I would like to have a clear picture of all the database objects that have been referenced in datasets, views and workflows to produce the data to support some documentation of our lineage.
I can go into the central views and 'Map Source' individually and then copy that information externally.
Is there a smart way to list all data sources to a single file - either at total installation level, or for each space ?
Best Answers
-
Hi Dom
My first thought would be to filter the Dataset screen to only show External sources (exclude files and snapshots). Hopefully these are named in a way that makes it clear the database table being referenced (or someone has added Summary details to make this clear)?
For each Dataset, view the Associations (Options --> Associations) to see where it is being used:
In this example a single Dataset is being referenced by 4 Workflows and 1 View. This could be other objects like a Functions parameter, Chart, etc.
Something else that might help is the List Datasets API call which will return all the datasets with their type, volume of data, space, load date, etc. https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/extend-data-studio-functionality/use-the-rest-api/#dataset-operations
0 -
Hi Josh,
Thank you that's very helpful.
Sadly we are still on 2.1.11, but this adds weight to the case for getting (and staying) up to date.
0