Guidance Needed: Transitioning Data Sources from HD Insights to Databricks
Hi Team,
We are anticipating changes in our datastore architecture, which will significantly impact datasets and views currently sourced from HD Insights. These sources will need to be repointed to Databricks as part of the transition.
Could you please advise on the possible solutions or best practices for handling this migration?
Looking forward to your suggestions.
Comments
-
Firstly, the Compare step is useful to ensure data has been migrated successfully.
Secondly, in v2.16.4 we added the ability for multiple Sources to write to a Dataset:
https://community.experianaperture.io/discussion/1508/aperture-data-studio-2-16-4So you can add a Databricks table as a new source to a Dataset that is currently still receiving data from HD Insights
0 -
Manually adding a new source to each Dataset post-migration isn’t feasible in our case, as we’re dealing with hundreds or even thousands of Datasets that need to be repointed from HD Insights to Databricks.
Has anyone encountered a similar large-scale migration scenario before?
Is there any workaround or automation support that Aperture can provide to streamline this process?Any guidance or shared experiences would be greatly appreciated!
1 -
Hi Josh,
I am writing to follow up on Sneha's support request submitted few weeks ago regarding the large-scale migration of datasets from HD Insights to Databricks.
Currently, our Databricks setup is connected to the Hive metastore. However, in the next 6 to 8 months, we plan to transition to Unity Catalog, which will also involve a change in the database name.Given the scale of our migration, manually repointing each dataset to the new source system is not feasible, as we are dealing with hundreds or even thousands of datasets.
Could you please provide any guidance or share experiences related to automated processes for repointing datasets to a new source system?I hope my request is clear. I am happy to set up a call to explain this in more detail if needed.
Thanks,
Jencil.
0 -
Sorry I cannot think of a simple way to do this. The new Sources functionality saves you having to update Workflows and remap to new Datasets, but it is not possible to use an API to manage Sources (though it might be in 6 months)
Maybe someone else has an different solution in how they might handle this.
0