Relationship Discovery by using Find Duplicates Step in Aperture Data Studio
We tried to find the join candidates across tables and databases by using the profiling metadata in Aperture Data Studio. In terms of the blocking keys in Find Duplicates Step, we chose some attributes like Most Common Format, Dominant Datatype, Standard Deviation, Average Length, Length Deviation, Frequency Deviation, Format Frequency. Deviation, as well as Aperture Tags.
However, the prediction result from Find Duplicates is not accurate enough. We think the reasons are that the rules for Find Duplicates is very basic, and also there is a need to reconsider the blocking keys. Could you please help us improve the accuracy?
I tried to drag our dataset and rules file to this post, but it showed "Request failed with status code 403". How could I share the files with you so that you can have a better understanding of the problems?
Thanks for your time.