Searching for metadata in the UI (e.g from datasets view or others)
Hi,
I'm working on a data discovery project that needs to identify the best sources of critical data assets across the business. We have four systems and nearly 3000 tables - some of which have many hundreds of columns. I've used the new bulk load feature to load the data, but now I'm struggling to navigate the metadata generated.
How do I search for different types of metadata in Aperture? I can't see how to search by column name, for example. How do I find all columns with a similar pattern, etc?
Thanks,
Phil
0
Answers
Hi @Phil Watt
If I understand your ask, correctly, you have many tables and within those tables you may have a subset that have columns of data that don't necessarily have the same column names nor the same same data but which may contain data like a telephone number (for examples)
Data Studio will identified perhaps the following :
Data Tagging 'may' have identified these columns correctly as Phone Numbers but more importantly, you feel that the formats or Patterns in the data actually may be more indicative - is that correct?
You now, want to know how you could collate a list of all of these tables and all of these columns so that you can decide what to do next ?
Is that what you are asking?
Clinton
Hi Clinton,
Yes, and no...
The process you describe is a big part of our approach. But before, that, we are looking for a way to serendipitously navigate the data. In your example, we might want to ask:
Having the same pattern and datatype could make two columns a good candidate for a join key. This can be further clarified with the uniqueness and completeness. For example, would it be useful to rollup/deduplicate one or both columns before joining?
Does this make more sense?
Thanks and regards,
Phil
Hi Phil
ok that makes sense. There isn't a native way within Data Studio to see all tables that have the string "phone" as a column label. The way, this would have to be done, is via a Data Profiling of the data and then a dump of the profile results into an external database.
This could potentially also be done by using parameterised workflows containing a profile step and then writing the profile results to a snapshot which could be explored but my sense is that this wouldn't be a very friendly, practical or efficient approach.
Given the availability of the APIs, the best approach I think, might be to push the datasets via API, profile them and then write the profiling results (with lineage) to an external table where you can slice and dice the data as required.
I'll be interested to hear whether @Chris Allan, @Henry Simms or @Khen Hah thinks there is an easier way to do this.
Clinton