Best Of
Re: Where could I find more information on how to use "Query duplicates store"?
Hi Michael
Query duplicate store is used to query the results of duplicate matching where the UniqueID and ClusterID is not known but the name and contact details are. This is useful for situations where new records are incoming into the SCV from an external source that has no direct links with the SCV. You can query the SCV to identify records that already exist within the SCV. Results with this query may link all records / none or somewhere in between, it will depend on the data you are comparing. It is possible that 1% of your data can only be found in your match store.
Some things to note on the configuration:
Make sure that the selected columns in query match the number and order of those used in the Find Duplicates step, including the Unique ID column (It is ok to send in nothing for this)
For example my Find duplicates setup contains 29 columns
I would then configure query with all 29 columns even if I don't have any data for them in my input file. You can create new columns you don't have using the Transform step and use the Constant function to apply a null value Or you could make use of the map to target step to ensure you have the correct columns as well.
Using Transform
Map to Target
Make sure you take a snapshot directly after the Query Step, the Show Data button will only run 20 rows by default for performance reasons, using a snapshot and running the workflow directly ensures all rows a run.
Failing that I would recommend trying to find easy records from your external system that exist in the SCV by joining attributes such as Name and Email or Name and Phone and make sure these are returned from the query step and verifying the configuration is correct.
Attached is the setup guide for the step, but it does have some basic usage in there too
Re: Identification of an individual flat in a building using Address Validate
Thanks @Nigel Light . I am going to work with the team to get this working in V1 too. I will keep you posted on the progress!
Re: Delta Data Loads - Data Studio version 2.0
@Carolyn congratulations on going live !
Re: Edge
Great question @Nigel Light we test, support and use Chrome, Edge and Chromium so you should be good.
Re: Address Rationalisation
@Keith Alexander I think what you're after is the split function.
This takes an input string, a character to split on (comma in your case) and then the item to return.
The example above shows that for Address Line 1. You would create a workflow with a transform step, which creates 5 columns (Address Line 1 -5) and for each, it's just the split step, with the original column as the input and the relevant line number to return.
Re: Scheduling error
Hi @stevenmckinnon - have you opened a support ticket for this to be investigated?
Re: Scheduling error
Just to close this one off, we discovered that there was a Data Studio issue preventing a scheduled workflow from export back to a table in a SQL Server system that used NTLM authentication.
We were able to work around the problem by using a setting in Data Studio to enable some as-yet-unreleased functionality. This functionality will be enabled by default from v2.1
Re: Edit Distance
Hi Patrick
I'm not going to pretend to understand the maths, but the prefix value gives higher ratings to strings that have a common prefix up to a prefix length of 4
The prefix weighting will be applied to the length of prefix that you supply and weight the results in preference of the strings with common prefixes.
Re: Anyone got the Azure Blob Storage Datasource connection working
Sorry, I confused myself I read AWS S3 and that was what i was looking at
I just tested this on v1.6.2 (should still hold true for v1.6.3
Re: Setting up custom Find Duplicates rules
Hi @Carolyn
Find Duplicates works by standardising data and mapping the fields you supply to standardised fields - so you are correct in surmising that the fields get individually mapped. We have a tool that can help you to see the workings of this process called the Find Duplicates Workbench. You're local support team will be able to provide you with this tool and walk you through a few examples.
The good news is that all your requests are possible in Data Studio.
If you want to match on two fields combined, you could use the "Generic Field" option. Use a concatenate function to combine the First and Middle Name in to a single field in your dataset and then use the Generic Field mapping in Find Duplicates to match on that field.
Once you have mapped the field you'll need to modify your underlying rule set to include the new name rule. How you do that will depend on exactly what business rule you want to implement. Do you want to replace the existing individual field name match or add to it?
For your second question, i would again use the Find Duplicates Workbench tool to evaluate exactly why you are not getting good matches and to tweak the underlying individual rule set. Using this tool, it is quite straight forward to include an AND/OR rule for date of birth. It is possible to set up the rules so it treats an empty date of birth as an exact match, which would then revert the rules to using the name address fields.
It is a bit complex to include detailed steps to do this in this forum. It would be best if one of your local consultants can furnish you with a copy of the Find Duplicates Workbench and show you how to evaluate the rules and change them. I will contact your local team and try and arrange this but copying in @Ian Buckle @Sean Edmunds as an fyi.
To give you an idea of what the Workbench can show you:
You can view the Standardised fields for any two records in the Find Duplicates store:
You can then view how the rules evaluate those two records and why they do/don't match:
In this example you can see that the rule does not evaluate exact because the Forename rules are not evaluated as Exact.
You can then use the rules editor to tweak the rules:
You can also find detailed information on how Find Duplicates uses blocking keys and rules in our online docs:
https://www.edq.com/documentation/aperture-data-studio/find-duplicates-step/advanced-config/