Timeout during export

I have a workflow that is exporting to a new table in an external database, using the option to create a new table. There are approximately 500,000 records and 7 columns.

The export seemed to be writing extremely slowly, and Aperture timed out whilst writing. I had no way of telling what Aperture had "written" to the table, or if it was continuing to write, because the settings were set up to commit changes at the very end. The monitoring section in Aperture did show a percentage complete but this was below 20% even after around 20 minutes.

I cancelled the job and decided to write to a CSV on my local PC, in the hope that it would be quicker, and it completed the export to CSV in half an hour. The final file size is around 49MB. Given that this is a simple text file I was hoping it would be quite a bit faster than that. I had checked to ensure that this time was not taken up by processing steps in the workflow, and the processing element had already finished. Only the write element was taking this time. I was also making an effort to ensure that Aperture didn't time out as I don't know if this would have an impact, and I didn't want to lose time by having to re-run a large export.

My questions are this: does Aperture continue with exports or processing workflows even if the user has timed out or logged out of Aperture? Does it continue as long as the service is running? Is this write speed typical? Could it be made quicker?

Edit
I am having to re-run this workflow where some records are changing, so I have changed the export setting to "Update" changed records in the external database, to see if this would be any quicker than extracting everything into CSV and importing it manually. There are around 200k records that need one text value changing. In around 90 minutes, the progress indicator has reached 3%. This is painful to see, and I think it needs to be addressed as soon as possible.

Tagged:

Answers

  • Clinton JonesClinton Jones Experian Elite
    edited July 2019

    Performance problems are typically tied to the following issues:
    size of machine and competing process and memory resources

    • Type of data source connection
    • Integration performance
    • Quality of disk.

    The optimal configuration is covered under: https://www.edq.com/documentation/aperture-data-studio/technical-recommendations/
    It is fair to say that large jobs will take some time to run, either to load data or to run through the data and then update the target with either a direct database update or writing a file to disk.

    A file of 1 million rows with around a dozen columns should load and profile in data studio in around a minute on a very average laptop machine with around 16Gb of memory, running the usual Microsoft Office applications and Google Chrome.

    Running that same data set through a combination of sorts, filters, transforms etc and then generating an export file will take a few seconds longer from start to finish but should complete in under two minutes on that same machine in a healthy state with SDD for disk storage.
    The performance is pretty linear, though 15M rows will not necessarily take exactly 15 minutes to load and run it might take a minute or two longer or less, plan accordingly.

    If you are working with databases and loading or updating is slow then this could be attributed to a myriad of reasons. The main reasons being network latency and relative proximity of the database. Other reasons could be the size of the update batches that you are running, sensibly chunked batches will run faster as they have fewer database commits and a healthy database with up to date statistics and indexes will make the extractions and updates run faster.

    If you cannot diagnose your problem you should contact your regional support contact : https://www.edq.com/documentation/contact-support/

Sign In or Register to comment.