Use of SQL to refine source data prior to loading into Aperture

Nigel Light
Nigel Light Contributor
edited December 2023 in General

Hi,

I was just wondering whether other users would find a Pandora-like feature to pre-filter prior to loading data useful? (I certainly would - especially when loading large database tables only to immediately filter-out the majority of the historic data)

Similarly, who would find running SQL queries to load data into Aperture useful?

Don't forget - Christmas is coming, it would be a great gift to receive this in 2.x... ?

Nige


 

Comments

  • Akshay Davis
    Akshay Davis Experian Super Contributor

    @Nigel Light I believe one of the reasons it was removed in Aperture Data Studio was the security concern over letting any SQL be run against the database. How else were you using that feature? Or was it primarily to run filters prior to loading?

  • It would be great to be able to pre-filter data (as that would greatly reduce the amount we import when retrieving data from a data warehouse that contains slowly changing historical data).

    Additionally, it would also be very useful to be able to run complex SQL code, joining together several large tables, applying filters etc to create a single set of reference data (eg client keys used for a particular client mailing) from within Aperture so we could then use this set for further quality analysis eg checking addresses for a set of clients. Without this feature we either have to create this file by running SQL outside of Aperture and then exporting/saving from SQL and loading into Aperture (with inherent risks of getting the wrong file) or, alternatively, recreating a complex SQL job in Aperture (plus the overhead of reading in several extremely large tables each time we want to run the job)

  • Clinton Jones
    Clinton Jones Experian Elite

    @Chris Allan and the development team have been mulling over push down processing for some time, not just for conventional RDBMS but also for some of those big iron non SQL type database sources. This is still a bit far out on the roadmap to materialise or show anything tangible but the circumstances under which you need to be able to do this @Nigel Light are important for helping us to set a priority.

    My distillation of the ask is in this instance is a SQL injection which is then pushed down to the source. I am guessing that you don't want to create a materialized or indexed view based on that SQL in the source database - I would be interested to know why particularly?

    One of the challenges with SQL injection is compliance and security as outlined by @Akshay Davis; read only user credentials can mitigate that, but the greater problem and concern may ultimately be performance and just downright bad SQL.

  • Thanks Clinton

    Re:your query; for us this is just the logistical issues with setting up views against Development/Live corporate databases and the need to involve other parties to create/promote/reapply these solutions, based on existing database permissions, which extends development timescales (and further complicates things if we wish to subsequently amend these based on results obtained).

    Nige 

  • Clinton Jones
    Clinton Jones Experian Elite
    edited September 2019

    @Nigel Light I completely get it, this is the challenge when you're on the business or data side and the keys to the kingdom on the database or system side are owned either by IT or a third party service provider. It does push one to having to consider data federation very carefully. We we to make that process easy and straightforward while ensuring that we're always compliant. 'views' in v2.0 will come some way toward help with this.

  • Furthermore re:My Files

    Q - are these just 'my' files or can they be shared with the wider team?

    (the latter would be a useful feature, otherwise we will all have to go through the same process of uploading them and risk introducing different versions of source data) - another reason for getting embedded SQL

  • [Deleted User]
    [Deleted User] Experian Contributor

    Actually the use of the pre filter in Pandora was very handy to load the Delta data only in Aperture, like CDC

    Now we need to load the same dataset everyday and filter it out, or create a view on the source

  • Clinton Jones
    Clinton Jones Experian Elite

    @Ahmed Elshahawy unfortunately SQL injection often doesn't stand up to the sniff test of audit - here's a view on it https://pentest-tools.com/blog/sql-injection-attacks/

    This vulnerability is one of the reasons we don't support it in Data Studio.

  • That is a shame but with the vulnerability risk, I guess understandable

    No chance that there could be a pre-filter but limit it to comparing dates/times (so that deltas/most recent events can be loaded)?

    Nige