Auto discovery of keys (foreign, primary)

Logging this idea on behalf of @AlanB (feel free to add further commentary)
What problem are you facing?
We are starting a migration project from a largely undocumented system and need to identify how the various tables relate to one another (e.g. what are the foreign / primary keys, potential joins as well as conducting bulk data profiling to understand the structural integrity of the data).
We understand Data Studio can help us iteratively evaluate joins within a workflow and can conduct data profiling at full volume, however it'd be great if the software could go further to intelligently identify potential joins between tables.
What impact does this problem have on you/your business?
The manual effort of identifying and verifying these relationships can take a long time.
Do you have any existing workarounds? If so, please describe those.
It’s something we can do manually with a bit of effort but having the feature would be very helpful.
Do you have any suggestions to solve the problem? Feel free to add images if this helps.
I have seen tools before that offer this capability which I believe work by accessing stored views and query history in the database to profile how tables are commonly joined to make recommendations on key columns.
Some of the big catalog tools have this feature too, so I’m sure it would be useful.
This is then often combined with profiling data e.g. high cardinality for key columns vs low cardinality for reference data.
Comments
-
This is a very interesting topic and certainly something I'd personally love to see in the product suite, too.
In cases where we may not have access to source system catalogues, or to deal with source databases that are poorly modelled, we've worked on custom utilities for this purpose in the past. Unfortunately, they have tended to be very computationally expensive; the fact that most commonly encountered PK / FK columns are integers and there are matches almost everywhere is a prime cause of this!
However, in cases where we do have access to source system catalogues (in the case of MS SQL Server, the INFORMATION_SCHEMA or the sys tables from which it is comprised), it can be quite simple to identify the PK / FK relationships in bulk - e.g., below for a SQL Server DB:
SELECT FK.name AS ForeignKey
, OBJECT_NAME(FK.parent_object_id) AS TableName
, COL_NAME(FC.parent_object_id, FC.parent_column_id) AS ColumnName
, OBJECT_NAME (FK.referenced_object_id) AS ReferenceTableName
, COL_NAME(FC.referenced_object_id, FC.referenced_column_id) AS ReferenceColumnName
FROM
sys.foreign_keys FK
JOIN
sys.foreign_key_columns FC ON FK.OBJECT_ID = FC.constraint_object_id0 -
Pandora used to have a feature to discover FKs, didn't need to access the DB schema . Am I remembering correctly?
0