Best Of
Re: Identification of an individual flat in a building using Address Validate
Thanks @Nigel Light . I am going to work with the team to get this working in V1 too. I will keep you posted on the progress!
Re: Delta Data Loads - Data Studio version 2.0
@Carolyn congratulations on going live !
Re: Edge
Great question @Nigel Light we test, support and use Chrome, Edge and Chromium so you should be good.
Re: Address Rationalisation
@Keith Alexander I think what you're after is the split function.
This takes an input string, a character to split on (comma in your case) and then the item to return.
The example above shows that for Address Line 1. You would create a workflow with a transform step, which creates 5 columns (Address Line 1 -5) and for each, it's just the split step, with the original column as the input and the relevant line number to return.
Re: Scheduling error
Hi @stevenmckinnon - have you opened a support ticket for this to be investigated?
Re: Scheduling error
Just to close this one off, we discovered that there was a Data Studio issue preventing a scheduled workflow from export back to a table in a SQL Server system that used NTLM authentication.
We were able to work around the problem by using a setting in Data Studio to enable some as-yet-unreleased functionality. This functionality will be enabled by default from v2.1
Re: Edit Distance
Hi Patrick
I'm not going to pretend to understand the maths, but the prefix value gives higher ratings to strings that have a common prefix up to a prefix length of 4
The prefix weighting will be applied to the length of prefix that you supply and weight the results in preference of the strings with common prefixes.
Re: Anyone got the Azure Blob Storage Datasource connection working
Sorry, I confused myself I read AWS S3 and that was what i was looking at
I just tested this on v1.6.2 (should still hold true for v1.6.3
Re: Setting up custom Find Duplicates rules
Hi @Carolyn
Find Duplicates works by standardising data and mapping the fields you supply to standardised fields - so you are correct in surmising that the fields get individually mapped. We have a tool that can help you to see the workings of this process called the Find Duplicates Workbench. You're local support team will be able to provide you with this tool and walk you through a few examples.
The good news is that all your requests are possible in Data Studio.
If you want to match on two fields combined, you could use the "Generic Field" option. Use a concatenate function to combine the First and Middle Name in to a single field in your dataset and then use the Generic Field mapping in Find Duplicates to match on that field.
Once you have mapped the field you'll need to modify your underlying rule set to include the new name rule. How you do that will depend on exactly what business rule you want to implement. Do you want to replace the existing individual field name match or add to it?
For your second question, i would again use the Find Duplicates Workbench tool to evaluate exactly why you are not getting good matches and to tweak the underlying individual rule set. Using this tool, it is quite straight forward to include an AND/OR rule for date of birth. It is possible to set up the rules so it treats an empty date of birth as an exact match, which would then revert the rules to using the name address fields.
It is a bit complex to include detailed steps to do this in this forum. It would be best if one of your local consultants can furnish you with a copy of the Find Duplicates Workbench and show you how to evaluate the rules and change them. I will contact your local team and try and arrange this but copying in @Ian Buckle @Sean Edmunds as an fyi.
To give you an idea of what the Workbench can show you:
You can view the Standardised fields for any two records in the Find Duplicates store:
You can then view how the rules evaluate those two records and why they do/don't match:
In this example you can see that the rule does not evaluate exact because the Forename rules are not evaluated as Exact.
You can then use the rules editor to tweak the rules:
You can also find detailed information on how Find Duplicates uses blocking keys and rules in our online docs:
https://www.edq.com/documentation/aperture-data-studio/find-duplicates-step/advanced-config/
Re: What is the status of data catalog integrations, for example Collibra?
Hi @Sami Laine in our conversations with the folks aat Collibra, we have determined that the co-implementation of Mulesoft together with Collibra and a platform like Data Studio represents a lot of implementation friction.
Accordingly one of the more recent implementations of Data Studio with Collibra, connects Data Studio to the Collibra platform directly and bypasses using Mulesoft altogether. @Ivan Ng can provide you with more details, you'll also find details of that integration on the Collibra marketplace accompanied by some overview slides .