SSIS and Aperture

Is it feasible to embed an aperture task within a scheduled SSIS job? This would be used in the initial and CDC jobs from source into destination database. Example would be to extract Nominal from Source database, cleanse data with rules in Aperture, have an output table in Aperture, use this table as the source table for the ETL process to load into destination Database.

Answers

  • Clinton JonesClinton Jones Experian Elite

    Hi @stevenmckinnon this is an interesting ask and is in fact something that we're looking at as further extensions to Data Studio v2.x.

    At the moment there are REST APis that alllow you to execute workflows using integrations. Execution of workflows via API is available in v1 and v2 and execution of the workflow could by anything for process orchestration that can call or invoke a REST service.

    There are a couple of conceptual ways that you could have Data Studio work with SSIS depending on where SSIS features in the process.

    Data Studio at the start of the process

    If your workflow is pulling data from files or databases and then doing work to that data and rehoming or performing a rountrip into SQLServer then this process would have Data Studio performing pre-step filter, sort, group, aggregate, dedupe, validate and append ahead of an SSIS action.

    I guess the question would be, what exactly is the SSIS job doing?

    You could also consider just doing all the work in Data Studio and then 'publishing' the data via a snapshot which would be available via ODBC to be picked up by a downstream process. This is a little more brittle but it is also doable. This process is typically how we see data prep being done for reporting services or reporting in other tools like Tableau, PowerBI, Qlik and even Excel.

    Data Studio at any point but as the orchestration controller

    Here we would likely recommend a couple of different approaches.

    If you have a CDC or you have created materialized views of data or you have some upstram process that produces files, then we would likely suggest either using the data file dropzones in v2, file based events (YAML in v1) for file based deltas. In v2 dropzones are inherent to the way auto-loading of data is done (workflow still needs execution though by schedule or explicit invocation) In v1 the YAML implementation kicks off the workflow in response to the appearance of an object in the file system.

    If the data is in a database already then we'd recommend using a materialized or indexed view for the source data and a JDBC connector rather than ingesting all the data again (unnecessarily perhaps) and reevaluating it all over again.

    if you're working with an exotic file type, special staging area or need special parsing or cannot pull the data from a standard supported file type or JDBC connection then a custom step built using the Data Studio SDK should be considered. This would be a common use case where, for example, you pull the data from an SFTP site or you do some sort of special handling like algorithmic file renaming and file movements. We've seen this also used to execute things like Python scripts external to Data Studio.

    Whatever you're doing here is done in the context of the workflow with the custom steps smply acting as calls to external resources, services or orchestration components. The workflow resumes with the follow on steps once the custom step has completed.

    Custom steps can be added anywhere in teh workflow in both v1 and v2.

Sign In or Register to comment.