Impact of snapshots on memory

Nigel Light
Nigel Light Contributor
edited December 2023 in General

Hi,

I am using a job that has several input snapshots defined in it, though I only connect 2 of them each time I run the job. it seems to be using a lot of storage (and slowing Aperture down) - is this correct? Something to do with caching?

I am thinking, keep the number of snapshots in a job to a minimum. Does this apply with input data sources too and it is best to delete these rather than just leaving them to 'hang around'?

Also, how do I go about deleting snapshots when they are no longer required - can anybody advise?

Thanks - and Happy Christmas

Nige

Answers

  • Tanj Jagpal
    Tanj Jagpal Administrator
    edited December 2019

    @Ian Thornton @Ian Hayden is this something you could help Nige with?

    Merry Xmas to you and the team Nige!!

  • Thanks @Tanj Jagpal

    Nb I found where to delete Snapshots - cunningly hidden under the Monitoring icon

    Nige

  • Tanj Jagpal
    Tanj Jagpal Administrator

    @Adrian Westlake maybe something to consider in v2 (if not already!)

  • Clinton Jones
    Clinton Jones Experian Elite
    edited December 2019

    @Nigel Light here is the documentation for snapshots, as you rightly discovered you can delete snapshots, more importantly you can limit the number of snapshots that Data Studio stores.

    Seeing the vitals of snapshots is visible under monitoring and deletable there


  • Thanks @Clinton Jones

    I'm guessing there is no way of knowing, when developing a subsequent job, what version of the snapshot you are actually reading?

    (these are not dynamic ie running a follow-on step would select the most recent snapshot at the time the job was created and if the feeder step had been subsequently reran you would not get the latest version)

    Would be useful if this could be incorporated in v2 - either the option to make this dynamic or some indication of the version; particularly during the development stage when this might be fluid.

    Nige

  • Clinton Jones
    Clinton Jones Experian Elite

    @Nigel Light there are three snapshot workflow steps


    the first one is obvious, it is the action of saving a snapshot

    The second is also obvious, it is the latest snapshot that you have of that particular snapshot range

    The third is perhaps a little less obvious if you haven't played with it, '0' will show all the known content for a given snapshot, 2 will give the latest two

    It gives you the options to see the latest snapshot or the accumulated results of a number of snapshots and time box some of the output.

    as an example with the value '2'

    as compared with '0' meaning all snapshot rows

    here you see i have 4 runs of 22 rows each

  • i Clinton

    Doesn't 'Latest Snapshot' take the snapshot from job 1 at the time of job 2 being created.

    So, if Job 1 is subsequently reran the developer has to reattach the 'Latest Snapshot' node to pick up the output from the most recent run of Job 1?

    It sounds like I'll have to play with 'Use Snapshot Range' to obtain the most recent run of Job 1 every time I run Job 2

    Thanks

    Nige

  • Clinton Jones
    Clinton Jones Experian Elite

    latest snapshot is the last snapshot generated.

    If you have a job scheduled to run at noon everday and it takes say 10 minutes then at 12:01 while the job is currently active/running then the snapshot has not yet been finalized, so you will see yesterday's snapshot

    This is also why it is better not to have snapshot dependencies in parallel flows in the same workflow

    You should always use a separate workflow or cascade the second piece of logic from the output node of the snapshot in your workflow


  • Hi,

    I'm using separate workflows - but last snapshot seems to be the last one at time of running and isn't dynamic ie if job1 was subsequently reran, it won't be picked up in job 2 without deleting/re-adding it to the job

    If I use snapshot range and generate more snapshots, will this create a higher snapshot number each time? So, if I want the most recent I will need to amend the snapshot number?

    Off now - mince pies and turkey are calling. Catch up on the other side... and hope you and all readers have a good Christmas

    Nige

     

  • Clinton Jones
    Clinton Jones Experian Elite

    this sounds like a timing issue - how soon after the run of workflows#1 are you running workflow#2 ?

    effectively the snapshots are stored with all the rows in a run having a date and timestamp although i believe a number is stored somewhere in the repository.

    You shouldn't have to mess with numbers generally.

    Using the latest snapshot step should give you the snapshot that was generated the last time the generating workflow ran.

    If it isn't then there is something else going on...


    Have a good break and let's pick this one up in the New Year!

  • Thanks @Clinton Jones  - this was my understanding but this doesn't seem to be our experience. Nige

  • Clinton Jones
    Clinton Jones Experian Elite

    @Nigel Light I would recommend opening a support case to get it investigated further