Updating Data via Aperture vs SQL

Hi,

Was hoping fellow-customers or Experian staff would be able to give their experiences with this one. Wanting to update data in the DB directly using Aperture (thru the Export step - JDBC), but the DBA wants updates done via SQL scripting (is hesitant to use Aperture). I am pushing for Aperture as it is a controlled, rules-based environment. Any thoughts / inputs from this group?

Thanks

Answers

  • Mirjam Schuke
    Mirjam Schuke Administrator

    Hi James, where does your DBA see the main benefit for SQL (performance, control, flexibility, something else) and what are their objections to Data Studio?

  • I think it's control. knowing exactly what is happening. I suggested to include the log showing the updates in a test env to get approval. Just hoped that there might be an anecdote or comparison on why using Aperture is low-risk and a preferred choice.

  • Henry Simms
    Henry Simms Administrator

    For me, Data Studio with JDBC is a preferred choice for much of the SQL-based database updates for a few reasons:

    1. Ability to validate data against the target schema in a workflow before attempting and DB updates. I can use a Validate step to check data types, nulls, uniqueness, lengths etc. to construct a set of data that should be able to enter my database without errors. I can even use the Suggest Validation Rules feature against my existing data to speed up rule creation, or apply an existing ruleset.
    2. Control over how data is updated and how update errors are handled. Depending on the use-case I use a variety of export step options, including "Continue Workflow execution on JDBC exports error" with "Show failing rows" when I want to specifically capture and handle any insert or update failures. Using events and notifications I can get Data Studio to proactively alert on errored export rows and trigger downstream processes. I can also usually get detailed logging from the drivers to indicate exactly what calls are being made to the database, through simple config.
    3. Features like authentication and data security. The JDBC driver will (in most cases) have built-in and well documented settings to allow you to easily configure the authentication mechanism and data encryption in transit required by your organization. This means I don't have to worry about handling the movement of data to and from the script execution layer. Data Studio also handles credentials storage and access on a per-user or per-group basis.

    A scenario I have encountered where using JDBC has proved not to be the best option is with very large volumes where performance / efficiency is of high importance. Data Studio / JDBC inevitably introduces some overhead, and won't always do things in the most efficient way compared to a bespoke script. Many drivers do provide flexibility around, for example, usage of bulk load protocols, but these aren't always fully implemented.

  • @Henry Simms Thanks, those are all really great points. For #2 I am doing similar - sending the 'Passing' and 'Failing' rows to an output Excel for audit purposes. It's a shame the Export (JDBC) step doesn't allow export of the settings, I could have included those too - showing DBMS, Table name, mode etc.

    Would still like to hear from fellow Aperture customers on their approach for data updates - pros/cons and any challenges they had.

  • Henry Simms
    Henry Simms Administrator

    I like the idea of having the Export (JDBC) step include some metadata for the target table (e.g. tablename, DBMS system, credentials used etc.) in the passing and failing rows output, to help users / DBAs to debug or resolve them. I think it would be worth adding a new idea for this.

    Something similar can already be achieved with a bit of work. This approach is based on the fact that the schema for any database table you want to export to from Data Studio must already be loaded as a dataset in Data Studio. Given this, you can extract the Source metadata for the table and join it to the failing (or passing) rows. Here's one approach:

    Getting the source metadata (and then adding a timestamp and formatting to one row):

    Including the source metadata with the failed rows (using a Cartesian join to bring the metadata in to every row - there are other way to do this too!):

    Resulting snapshot with some failed rows:

  • I can add that to the new idea section. As for your metadata process - good stuff! We think along the same lines, in my output Excel report I do have a tab called 'Metadata' where I include all the source data metadata you also provide - only difference is, you join it with your failing rows data. And obviously, this all assumes that the source = export datasets. I appreciate your insights and ideas on this.