How to promote from dev to prod environments when using different external systems
Problem:
I’ve recently been working with a client who had used their Data Studio development environment to create workflows based on data they’d extracted from their development SQL Server database. They were in a position to promote those workflows to their production environment, and also point the data sources to use the live data from their production SQL Server DB (a separate instance to the dev database, with different credentials).
They wanted to ensure that they followed the right steps so that any future updates to their workflows in the dev environment could easily be synchronized to production, without the need to edit workflows once or reconfigure the external system every time.
Data Studio’s synchronization functionality allows you to do this, but it’s important to follow the steps laid out below.
To do this, we’ll make use of Data Studio’s metadata management capability: the ability to export any of the logic (dataset schemas, views, workflows, functions, external systems etc.) from one space to another. Note that no actual data will be exported or imported.
Scenario:
Environment 1 (DEV):
- External System connection to Azure SQL Server dev instance w/ dev credentials
- Space 1: Data Import
- Used to control access to source data
- Two datasets loaded from external system
- Two views created from these which are shared into…
- Space 2: Data Usage
- Contains workflow(s) that process the data in the views and export back to the external system
Environment 2 (PROD):
Desired State:
- External System connection to different SQL Server DB (it must be using the same JDBC driver)
- Identical Space / Dataset / View / Workflow structure
- Intention is to preserve links between dev and prod systems (to allow regular synchronization) without having to reconfigure the external system / credentials each time
Solution:
Initial set-up:
- In order to get into this desired state, the first thing you will need to do will be a full export of the contents of each space that you want to transfer from the DEV environment - including the external system & credential at least once (for my example, I export the external system and credential from the ‘Data Import’ space and just the workflow from the ‘Data Usage’ space - see screenshots at the end for the specific exports I configured)
- Ensure that the same driver used for the DEV environment external system is installed in the PROD environment
- Create spaces in the PROD environment to synchronize these .dmx files into. You many choose to use the same space names in Dev and Prod, but this isn’t a requirement. We would recommend using the same space structure
- Synchronize the space export that contains the external system and credential first, followed by the remaining spaces.
- NB Synchronize retains the link between the internal source repository UUID and the target internal repository UUID. Import will always import objects as new and not retain these links between systems
- Inspecting your spaces now should show everything is as it was in the DEV environment (including shared views / datasets / workflows etc.) with no errors reported in the workflows
- Finally, update the external system's connection properties to point to the PROD database, and fix up the credential used to to access the external system. It’s important at this stage to edit the existing External system and credential that you just synchronized, rather than creating new ones, as this maintains the link between the Datasets and their source system. Refreshing the data in your newly synchronized Datasets will now import data from the relevant tables in the production DB.
At this point, you should have a working copy of everything from the DEV environment set-up in your PROD environment using your new external system / credential.
Subsequent synchronizations:
For the next planned promotion between DEV and PROD, all objects that need to be updated should be included in the export EXCEPT any external systems that have been previously updated in the steps outlined above (any new external systems will need to be included and handled as in the steps above)
You will observe that on sync, any links (e.g. datasets / export steps) to the external system in DEV will automatically point to the (same but re-configured) new external system in the PROD environment without having to re-configure these every time.
Screenshots:
Initial DEV 'Data Import' space export screen:
Initial DEV 'Data Usage' space export screen:
Comments
-
Hello Matt,I hope you're doing good.
We have many issues when we export workflows from UAT and synchronize them into the prod.
In order to follow you instruction we need to be in a healthy situation. So we want to refresh UAT env with the prod data. After that we should no more mapping issuers when synchronizing our export.
In order to do this refresh, how should we do it ? is there any procedure ? what repositories should we copy from the UAT to the prod ?
Thank you in advance for your help
Best regards
Ali
0 -
Hi Ali,
All good here, I hope you are well too. Apologies for the late reply, unfortunately I didn't get a notification for your comment
Just to remind me - I think your situation is that you have your production and UAT systems in the desired state but there don't exist the links between environments created when synchronising (you've been using 'import' and then manually fixing things). You'd like to essentially copy the working Prod system back to your UAT system to then use as described above - synchronising any further changes in UAT back to Prod
I think we discussed one approach to this being to copy the repository from your prod system into your UAT system (to keep all the production approved objects) and then synchronising from this into a clean data studio 'environment' in your prod system (to allow you to keep the two up until you were confident the two were working before deleting the original). I remember we discussed that a draw-back to this approach would be the loss of anything in your UAT environment that had not yet been promoted to prod.
Is that what you would like to do?
An alternative approach would be to create the second prod 'environment' on your production data studio server (again, to keep a copy until you were happy with the new one) and synchronise everything from UAT that you wanted, following the steps from above to fix up the references to databases etc
Let me know which of those approaches you would like me to detail - I'm also happy to jump on a call to discuss further if you would like (reaching out to me on email might be more reliable than on here however)
Cheers,
Matt
0 -
Hi Matt,
You do remember well. We want to copy Aperture's prod repository into the UAT one.
Could you please indicate me what path we should copy from an env to another ?
Best regards
0 -
Hi Ali,
The simplest explanation for copying the repo from one environment to the other is as follows:
- The metadata repository that the Aperture Data Studio service runs on top of is a SQLite database
- It contains definitions for all the objects - workflows / dataset schemas / users etc.
- It can be found in the \data\repository\ folder in your data studio database root directory (by default, C:\ApertureDataStudio\data\repository\repository.db)
- You will see a couple of other files here - repository.db-shm and repository.db-wal that are created by SQLite (more information at https://sqlite.org/wal.html) - I've never copied these files when migrating repository files myself but it looks like best practice is to copy at least the '.db-wal' file alongside the .db file
- The simple process is then to copy these files from the old instance (prod), stop the server on the new one (UAT), replace the files for that installation and then restart the server
There are some other things to consider around this that we could also help you with:
- If you want to move any data (e.g. any snapshot's data) along with the repository file
- If you want to make a back-up of your existing UAT repository / data folder
- If you wanted to clean down the data folder (caches etc.) of your UAT environment before starting the server with the new repository
- As I mentioned before, if you were certain that you wanted to essentially overwrite your UAT instance
Seeing as this is a somewhat unusual process, we're keen to support you with this - I'll reach out to you via. email about the possibility of myself or Henry joining a call if you would like us to
I hope this helps,
Matt
1 - The metadata repository that the Aperture Data Studio service runs on top of is a SQLite database