Snapshot build taking nearly an hour

SpeedySteven
SpeedySteven Member
edited December 2023 in General

Hi - I have a table of approx 25m rows looking up to 2 other tables to retrieve 2 values. I then have a new column transformation with a simple if then else formula based on the two retrieved values - I'm creating a snapshot from this output and it it taking nearly an hour to create. Would this be expected.

Answers

  • Josh Boxer
    Josh Boxer Administrator
    edited August 2021

    Hi Steven, that sounds a little high, but within the rough expected time for 25M records (depending on machine spec and other jobs that might be running in the background). Looking at metrics from our performance tests, Lookup takes roughly 35 mins for 100M records (might depend on the chosen match type), I would think the most likely culprit is the Transform with an if then else. You could test this by duplicating the workflow, removing the Transform step and running it again.

  • Henry Simms
    Henry Simms Administrator
    @SpeedySteven If the lookup (reference) tables themselves are large and the values you're looking up to have high cardinality (lots of unique values), you could see a performance improvement by switching to Join steps instead of a Lookup.
  • Thanks for the feedback - I'll take a look and see what I get.

  • I've switched to a join step and now have a straight join between table 1 (29M) and table 2 (4k) . I'm expecting around (75m) as a result. However this is maxing out on 16CPUs!! 16% of the way though and crashing the server. Something doesn't sound right? There are no other significant processes running.

  • This was with "ignore datatype" on the match type - with "exact" this processed ok.

  • Josh Boxer
    Josh Boxer Administrator

    Hi Steven, this will be hard to diagnose, but from the little info it could be a memory leak (which could cause java to crash). There have been many stability and performance improvements in 2.2, 2.3 and 2.4 over past 10 months

    Is it possible to upgrade and try this again?

    If you still run into issues then I'd suggest opening a support ticket so they can help you to provide the information that would be required to look into this further for your specific setup: https://docs.experianaperture.io/more/contact-support/