Automation - only writing to last batch to dataset
Hi,
I have a multi-batch dataset (lets say DS1) that is configured for an external dropzone. When new data gets dropped and loaded into the dataset DS1 an automation then fires off a workflow. The workflow then gets that dataset DS1, and does some transformations, grabs metadata etc and then writes that dataset as a whole to a snapshot (lets say DS2).
However, rather than DS2 having all rows from DS1, it is only loading in the most recent batch data. If I run the workflow manually (not via automation), then all the rows from DS1 go to DS2.
Any suggestions? Thanks
Answers
-
Does the automation pass the dataset source to the workflow it executes (using Can supply source when executed)? Or is the workflow's source hard coded.
Possibly if it's the former, the workflow is only being triggered with the latest batch. Could you share a screenshot of the automation and the workflow?
0 -
Hi, it is hardcoded in the workflow source.
Workflow=
Automation=
thanks
1 -
I had a look at this, and the behaviour you describe is actually the expected behaviour here.
Despite the workflow being set up to take "all batches", it only gets triggered with the latest batch from the automation. When ran manually it takes all of the batches.
This design avoids any potential confusion over which batches are used, in particular in the scenario where dataset batches are being added and removed rapidly. Having said that, I think it would be to provide an option to "use all batches" in the execution, rather than the specific batch that was uploaded / refreshed to trigger the event. This may be a good Idea (feature request) to suggest.
To get the behaviour you want, a couple of approaches have been tried:
- Trigger an 'initial' workflow which subsequently triggers the workflow we originally wanted to trigger, this way it uses the whole dataset in the subsequent workflow
- Using multiple snapshots called 'current' and 'previous'
- Dataset is refreshed
- Move Current Results into Previous
- Refresh Current with data from refresh
- Process data
0 -
Henry - thanks for responding. I totally get why only the last batch would be used in the automation - but i was unaware, so definitely some tweak in the automation configuration would be good. Want me to put in the feature request?
0 -
I've added this:
0 -
Well put. Thanks
0