Profiling all tables of a source

Hi team,
Partner is using Aperture Data Studio to profile sources on client site - so that he can recommend Object mapping with client and confirming what client is attempted to map. They want to be quickly checking the validity of the fields and its contents vs what client thinks they need The initial profiling piece will help them facilitate their source to target mapping.
• 2 Source databases each having 1,382 tables
• 1 Source database having 1,668 tables
• 1 Source database having 2,184 tables
• 2 Source databases each having 138 tables

So some of the above have a reasonable number of tables, is there any way to easily setup profiling for all tables per connection?
I personally understand this is not a recommended approach and use of profiling piece but just wanted to check before getting back to them.

Thank you!
Shamma

Best Answer

Answers

  • Shamma RaghibShamma Raghib Experian Employee

    Thanks Clinton!

  • Hi @Clinton Jones. Is there any update to your response above that is applicable in the future roadmap of 2.0?


    I can certainly see a use case here.

    Was wondering whether there is the ability to create data driven workflows that could for example, profile specific tables if sufficient metadata was provided that determines the data source, entities and attributes that we might wish to profile and then potentially pass these as parameters to execute "profiling" workflow via a Rest API call. Does that make sense?


    In addition is there any consideration or technical constraint to add the ability to perform multi column value or frequency analysis as there was in Pandora?


    Many thanks

    Daniel

  • Clinton JonesClinton Jones Experian Elite

    @DTAconsulting what we're looking to do is to instrument data studio for support of cataloging solutions - this would facilitate adding a lot of source in a single pass.

    With the new design of v2 sources you can set up a view which is a profile of a dataset but views are generated dynamically so we would have to experiment with the practicality of using the combination of a "drop zone" and a profiling view would be generated on the latest view of the data but it does beg the question whether you shouldn't in fact have a specific intent with a workflow based on a schedule.

    Mass profiling has been observed to be an uncommon requirement and more of an edge case, it is time consuming and resource expensive.

    We may need to look at a trigger on certain kinds of views tied to drop zone events.

Sign In or Register to comment.