Kill Dataset refresh job via API

hi,

I am performing bulk data profiling on datasets using API automation where my steps are as below

  1. list all datasets to be data profiled in a csv file
  2. process csv for each row
  3. trigger dataset refresh and capture job id
  4. wait for job to get completed or for max 30 mins. if time elapsed is more than 30 mins then skip next steps and pick next dataset from list
  5. for successful dataset refresh and dataset with non-zero rows call workflow for data profiling
  6. wait for workflow completion
  7. repeat step 1-6

The problem arises in step4 when the time elapsed in more than 30 mins, even though the process picks up the next dataset for processing, the dataset refresh continues in the backend.

Is there a way to terminate the data set refresh job automatically during such scenario?

Answers

  • Josh Boxer
    Josh Boxer Administrator

    Hi Shreya, I cannot think of a way to do this today. Any reason for the 30 mins? If you knew it would take 31 mins would you be prepared to wait?
    A thought is that there could be an API to request a job is cancelled if you think this would be useful then you could suggest it in Ideas and see if others would also like to see this.

  • Thanks for your reply @Josh Boxer . Well 30 minutes is setup as a time interval that works for most of our datasets barring few and is defined for each datasets in my job config. We have even set higher Refresh Wait Time looking at the initial data load time, which is not easy to assess always as the initial load of jdbc type datasets in Experian using external system is manual. So while doing a bulk load its difficult to know how long the data load/refresh took as we don't see the option in Experian currently (It shows refreshed xx mins/days ago)

    Also, the dataset refresh is being done via API and if the job is running beyond the Refresh Wait Time although we move to the next dataset processing, it would be helpful if there were a way to kill it.

  • Josh Boxer
    Josh Boxer Administrator

    Depending on the databases involved you could discuss your use case with your Experian contact to see if Pushdown processing could help to do this more efficiently

    https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/extend-data-studio-functionality/pushdown-processing