Searching for metadata in the UI (e.g from datasets view or others)

Hi,

I'm working on a data discovery project that needs to identify the best sources of critical data assets across the business. We have four systems and nearly 3000 tables - some of which have many hundreds of columns. I've used the new bulk load feature to load the data, but now I'm struggling to navigate the metadata generated.

How do I search for different types of metadata in Aperture? I can't see how to search by column name, for example. How do I find all columns with a similar pattern, etc?

Thanks,

Phil

Answers

  • Clinton JonesClinton Jones Administrator

    Hi @Phil Watt

    If I understand your ask, correctly, you have many tables and within those tables you may have a subset that have columns of data that don't necessarily have the same column names nor the same same data but which may contain data like a telephone number (for examples)

    Data Studio will identified perhaps the following :

    • Table1
      • Phone with formats 999S999S9999 and 999S9999S9999
    • Table2
      • Phone_No with format 999S9999S9999
    • Table3
      • Home_Phone with formats 999S999S9999 and 999S9999S9999
      • Mobile_Phone with format 999S999S9999

    Data Tagging 'may' have identified these columns correctly as Phone Numbers but more importantly, you feel that the formats or Patterns in the data actually may be more indicative - is that correct?

    You now, want to know how you could collate a list of all of these tables and all of these columns so that you can decide what to do next ?

    Is that what you are asking?

    Clinton

  • Hi Clinton,

    Yes, and no...

    The process you describe is a big part of our approach. But before, that, we are looking for a way to serendipitously navigate the data. In your example, we might want to ask:

    • what tables have a column name that contains the string "phone"?
    • what columns have a data with the format "999S999S9999" and/or "999S9999S9999"?
    • Do these columns share the same data types?
    • Can we sort the results by uniqueness or completeness, for example?
    • etc.

    Having the same pattern and datatype could make two columns a good candidate for a join key. This can be further clarified with the uniqueness and completeness. For example, would it be useful to rollup/deduplicate one or both columns before joining?

    Does this make more sense?

    Thanks and regards,

    Phil

  • Clinton JonesClinton Jones Administrator

    Hi Phil

    ok that makes sense. There isn't a native way within Data Studio to see all tables that have the string "phone" as a column label. The way, this would have to be done, is via a Data Profiling of the data and then a dump of the profile results into an external database.

    This could potentially also be done by using parameterised workflows containing a profile step and then writing the profile results to a snapshot which could be explored but my sense is that this wouldn't be a very friendly, practical or efficient approach.

    Given the availability of the APIs, the best approach I think, might be to push the datasets via API, profile them and then write the profiling results (with lineage) to an external table where you can slice and dice the data as required.

    I'll be interested to hear whether @Chris Allan, @Henry Simms or @Khen Hah thinks there is an easier way to do this.

    Clinton

Sign In or Register to comment.