How long can/should the "Analyze blocking keys" function take?

BAmos
BAmos Member
edited December 2023 in General

Morning all,

I'm getting up and running with the find duplicates feature and have created my own blocking keys and matching rules now which are producing good results for the clustering, however when I try to analyse it in "Find Duplicates Workbench" the "Analyze blocking keys" tool is taking an extremely long time to process and is failing after about an hour without giving results. I've tried analysing the inbuilt settings found in the Step Settings area and these have worked quickly (a minute or two).

So my key questions are:

  1. Do custom configurations take more processing time to analyse?
  2. Is there any types of blocking key (size, data type, complexity) that adds particular strain to the "Analyze blocking keys" tool?
  3. Is this a common problem or could it be to do with our setup (we have a server hosting Aperture and the Workbench.

Any and all help will be greatly appreciated!

Best Answers

  • Josh Boxer
    Josh Boxer Administrator
    Answer ✓

    Hello

    1) No difference between custom and packaged configurations. 

    2) Analysing large stores will take time as all Standardized records are read and blocking keys computed to perform analysis.

    3) It could be that your server is not powerful enough. Some technical recommendations can be found here https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/set-up/technical-recommendations/ The best way to test this (for free!) is to reduce the volume of data by half to see how it handles it and if you still hit a timeout.

    Some info on optimizing keys that might be helpful in future:
    https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/find-duplicates-workbench/analyze-blocking-keys/

  • Josh Boxer
    Josh Boxer Administrator
    Answer ✓

    Something else I remember from a quite a while ago. Do your Key descriptions have Spaces:

    [
    {
    "description": "Full Post Code", rather than: [
    {
    "description": "FullPostcode",

    If so can you remove the Spaces and try again?

Answers

  • BAmos
    BAmos Member

    Thank you Josh, I'll have a look at the technical specs you've linked and see if we have the right cofig and resources assigned.

  • BAmos
    BAmos Member

    One of them does, the rest use underscores throughout. I'll remove the spaces and give that a try. Is that a known best practice?

  • Josh Boxer
    Josh Boxer Administrator

    The underscores are fine. The spaces might trigger an old bug in 'Analyze blocking keys', which we will investigate. Let me know if it does fix your issue.

  • BAmos
    BAmos Member

    Hi there,

    Just wanted to come back on the spaces issue for the 'analyze blocking keys' function. After removing those it's worked straight away so looks like it was the old bug that you mentioned. Please let me know if there are any more details from me you need to investigate.

    Thanks!

    Ben

  • Josh Boxer
    Josh Boxer Administrator

    Hi Ben

    Thanks for letting us know, pleased that it is now working. It will be resolved to prevent it happening in future.