Error attempting to delete match store

Shreya
Shreya Member
edited December 2023 in General

Hi,

In my project, we are using Experian Aperture Data Studio(2.10.10) to connect to a Remote Find Duplicate match store(3.8.15) over http.

The Find Duplicate step is configured with the 'Clear and re-establish store' checked as part of our requirements. Majority of the time ,the process runs end to end successfully, however sometimes we get error in the Find Duplicate step as below where deleting the match store fails

2023-05-26 03:47:44,947 ERROR c.e.d.m.MatchApiImpl [workpool-server-fixmem-executor-closer-862] Error attempting to delete match store:
com.experian.match.api.client.ApiException: {"matchStoreId":"Dupstore_Deduplication_RemoteVM","matchStoreLocation":"E:\ApertureDataStudio\data\experianmatch","state":"FAILED","progress":100.0,"message":"Find Duplicates step failed. java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]","createTime":"2023-05-25T03:12:05.465109500","startTime":"2023-05-25T03:13:33.728723900","finishTime":"2023-05-26T03:47:45.139292"}

As a resolution, currently we are restarting the remote Find Duplicate services and manually clearing out the match store from Data studio to confirm that clearing the duplicate store is working as expected. This process albeit manual works for now.

Could someone let me know if there is a permanent fix for this issue? Or how to resolve this issue so that a manual intervention wont be required in future?

Answers

  • Josh Boxer
    Josh Boxer Administrator

    Hi Shreya

    Our dev team think this error is unexpected. If you run into it again could you please share your Find duplicates log file (and possibly also your Data Studio log file also). The easiest way to do this is to contact our Support team (who can also help you or your administrator to locate these files if needed) https://docs.experianaperture.io/more/contact-support/

  • Shreya
    Shreya Member

    We have upgraded Experian Aperture Data studio(2.10 to 2.13.6) and Duplicate server (remote)(3.8.15 to 3.10.2) but we still do get recurring errors while deleting the match store as a pre step before running the Find Duplicates Steps.

    On checking the duplicate store, we get the error as duplicate store could not be found.

    To resolve this, currently we are restarting the remote Find Duplicate services and manually refreshing the match store from Data studio to confirm that connectivity to the duplicate store is working as expected and rerunning the workflow which works.

    Could someone let me know if there is a permanent fix for this issue? Or how to resolve this issue so that a manual intervention won't be required in future? PFA the error screenshot for error details.

  • Mirjam Schuke
    Mirjam Schuke Administrator

    Hi Shreya, we will need to look into this as we haven't heard this to be a common issue. Could you please contact support first so they can try to reproduce and gather logs etc. as we will need more information. You can find regional support contacts via the same link that Josh posted above. Thanks.

  • SCH
    SCH Member
    edited April 25

    Hi team,

    I am running into the same issue. Have tried re-creating the duplicate store and the job runs fine a few times and then fails and it seems to not be able to locate the duplicate store. Re-starting the duplicate server service fixes it but we don't have a fix from product team yet. Still working with support on it.

    Our duplicate stores are sitting on D: drive of the find duplicates server, thats the only non-standard thing I could think of.

  • Josh Boxer
    Josh Boxer Administrator

    @SCH what version of Data Studio and Find duplicates are you using?

    It was investigated that this occurs to stores that have had problems during creation, either from running low on resources or a job being stopped, but firstly you might want to upgrade to the latest versions and also keep an eye on resources running low.

  • SCH
    SCH Member

    We are on the latest versions:

    ADQ : 2.13.8.173

    FDS : 3.10.5

  • Josh Boxer
    Josh Boxer Administrator

    Thanks for confirming let me see if I can find the Support case you raised .

    Some info on system resourcing/requirements that might be helpful: https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/find-duplicates-step/installing-a-separate-instance/#customize-your-find-duplicates-service-configuration

  • SCH
    SCH Member
    edited April 26

    @Josh Boxer : I have confirmed that our separate instance is installed with the 16Vcpu and 128GB of Ram. I checked your link and want to add the JVM RAM parameters, could you confirm how to do it.

    This is the existing entry in find duplicates.ini file:

    Virtual Machine Parameters=-Dlogging.config=log4j2.xml -Dspring.config.name=find_duplicates -Dmatch.database.path.windows="D:\ApertureDataStudio\data\experianmatch" -Dserver.port="8080" -Dmatch.maximum.cluster.size="500" -Dmatch.standardize.url="http://localhost:5000"

    Would this need to change to :
    Virtual Machine Parameters=-Dlogging.config=log4j2.xml -Dspring.config.name=find_duplicates -Dmatch.database.path.windows="D:\ApertureDataStudio\data\experianmatch" -Dserver.port="8080" -Dmatch.maximum.cluster.size="500" -Dmatch.standardize.url="http://localhost:5000" -Xms32g -Xmx64g

  • Shreya
    Shreya Member

    hi,

    Below are the VM configs that we are using and the jvm parameters ( highlighted jvm properties were added upon earlier connects with product support team)

    find duplicates (remote)

    memory config : 32GB RAM and 600GB HDD

    Virtual Machine Parameters=-Dlogging.config=log4j2.xml -
    Dspring.config.name=find_duplicates -Dmatch.database.path.windows="E:\
    ApertureDataStudio\data\experianmatch" -Dserver.port="8443" -
    Dmatch.maximum.cluster.size="500" -Dmatch.standardize.url="http://localhost:5000" -
    Xms8g -Xmx20g

    Experian aperture Data studio server

    memory config : 32GB RAM and 600GB HDD

    Virtual Machine Parameters= -Djava.locale.providers=CLDR,COMPAT -
    Djavax.net.ssl.trustStore="E:\ApertureDataStudio\certificates\cacerts" -XX:+UseG1GC
    -XX:+UseStringDeduplication -XX:+HeapDumpOnOutOfMemoryError -Xmx20G

    Please note on both the server we are not running any other application except the Experian ones. Please let me know if any config change is required.