Triggering workflow executions upon file arrival in Aperture Data Studio v1 versus v2
Aperture Data Studio is commonly used for operational data quality management. This would mean that you are regularly pulling up-to-date information from multiple transactional systems used within your organization such that you can centrally perform data quality checks and improvements. This prepares the data for further usages such as for business reporting and analytics.
A typical scenario would be the need to load weekly batches of customer data from a file generated by another application, and then triggering a workflow within Data Studio to validate the data quality prior to updating the data to a master database.
Aperture Data Studio v1 allows for the unattended execution of workflows when a new version of a file is uploaded where the process is driven by a configuration file. The configuration file has to be written in YAML and might be challenging for a non-technical user.
In addition to the YAML configuration file, you will also need to define the location for watched files in a separate properties file. Here's an example of a filedatastores.properties file:
You may encounter errors especially if you are not sure about the parameters and keywords to be used the YAML and properties file.
In Aperture Data Studio v2, there is no need to manually create a YAML file and filedatastores.properties file since we have introduced the concept of using a Dataset Dropzone with Notifications. A dataset dropzone folder is created easily via the user interface by defining an External label when creating a dataset.
When an External label is defined for a dataset, a dataset dropzone folder is automatically created and located within the Aperture Data Studio database folder location defined during installation.
Configure a Notification Event to automatically trigger one or more published Workflows or Schedule when the dataset is loaded.