Sharing database connections across teams
What's the recommended approach to sharing tables/views and connections to databases across a wider team? Should we create new connections per user, share tables with specific users or create credentials for other users on the same connection?
I'm interested in the security aspect but also the performance impact of each of these.
Clinton Jones Experian Elite
@Akshay Davis just a further comment on this topic.
Users should be actively DISCOURAGED from creating their own personal JDBC connections, and instead should always use a common JDBC connection but their own credentials. The reason for this, is that Data Studio has no way of 'knowing' that the JDBC connection created by User A is not the same as the JDBC connection created by User B. This is also one of the reason why we encourage the administrator to be the ONLY person creating JDBC or database connections. This approach means fewer JDBC connections and more efficient data caching.
If every user has their own JDBC connection and own credentials (or even a share credential) - they will all be continuously filling the cache with their own different but essentially duplicated view of data coming from the source(s). Thats inefficient and will slow things down unnecessarily.1
Henry Simms Experian Super Contributor
@Akshay Davis , building on what @Clinton Jones has already said, there are 3 basic approaches to sharing a database connection across members of a team.
- Each user configures their own connection independently
- Admin creates a connection with a single set of access credentials and provides access to named users
- Admin creates a connection with multiple credentials, and assigns users to those credentials
I would always recommend using the third option, assigning individual credentials to individual users, unless you have a good reason not to!
But let’s take a look at the pros and cons of each approach to explain what might affect your decision. First make sure you’re familiar with how Data Studio allows administrators to create and manage credentials, covered in the “restricting access” section of the user guide
1. Each user configures their own connection independently
This approach does not use Data Studio’s credentials at all. Each user creates a new data source using the “create new data source” button…
And then configures their own connection details, including credentials (username and password in this case), but also including a lot of technical detail about the name and location of the database and schema, setting specific to the connection (like security, timeouts), logging settings and so on:
- Each user completely manages their own connection and settings
- Mistakes likely to be made if the same full set of connection details have to be entered by each user
- If any connection detail change (for example switching to a new schema), it would need to be edited by each user
- No overall view or control by the Data Studio admins about who can access which connections, and which table and view filters apply. All access management would need to be done by the DBA
- Data Studio sees these as separate connections, which means that if one user has loaded a large table into Data Studio, it will not appear loaded for another users. As a result multiple copies of the same data could end up being loaded, impacting disk space using up processing resources on both the DB server and the Data Studio server
2. Admin creates a connection with a single set of access credentials and provides access to named users
With this approach we can give access to an existing individual or team, all using the same set of credentials (called the Default credentials)
In the Data Source connection, select users or teams and add access as required
- Users share connection settings, meaning any changes to the connection only need to be made in one place
- If one user loads a table, it is loaded for all users (however this also means that an individual user has no direct control over how often loaded data is refreshed, and it could happen without the knowing – see ‘cons’)
- The same access to tables is applied to all users. It’s can’t be granularly applied to differentiate one user’s access from another’s.
- Loaded data could be refreshed by one user without other users being aware.
3. Admin creates a connection with multiple credentials, and assigns users to those credentials
In this scenario the administrator defines a connection, either with admin login credentials to the DMBS, or optionally with no credentials at all:
They then create credentials individual to each user (or to a group of users). By doing this, each credential can define its own table visibility filter, making it easy for an administrator to modify access to certain tables without needing the DBA to modify user roles in the DBMS:
We then assign these credentials to users:
- Admins have a lot of flexibility around controlling what data access a Data Studio user has. Access can easily be changed or revoked
- Connection details are still only defined in a single location, making them simple to edit
- Existing credentials defined in the DBMS can be reused
- On-boarding new Data Studio users with the correct level of data source access is simple and transparent
- Each set of credentials effectively sets up a separate copy of the connection, so each user is able to control when they load and refresh data from the DBMS source (unless they share credentials)
- The only real con here is that only an administrator can set up new Data Studio credentials, so if users already have DBMS access via a username \ password that they use in other tools, they will need to physically sit with the administrator to enter these in the UI.
In summary, I would always recommend using the third option, assigning individual credentials to individual users, unless you have a good reason not to!6
Using data that comes from files is problematic. The main reason being that the data is stored in the user's IMPORT storage area on the server.
The administrator is user #5 and anything visible as MyFiles as the Administrator is contained in that folder structure
To share common data sets as files there is a hack, that has been documented internally but which is not really recommended.
A better approach in my opinion is to stage all data that needs to be used across multiple users, via a data connection, ie in a database table or on Azure or AWS folders.
You have the ability to set up seperate credentials for each user.
here is a link to the support documentation on this topic
Thanks @Clinton Jones to be clear, my question was really only referring to database connections and the performance impact of having users creating separate connections to the same database vs. having credentials added to one connection.
Is there any difference in performance or resource utilisation?
@Akshay Davis by having more granular credentials the dba or owner of the database has more control over the who and the what.
In terms of performance, the credntials are a passthrough when the credntials are a single set being used by multiple users, that might not pass audit-muster for a given customer situation and so i also don't recommend a shared credential BUT there are valid reasons where a share credntial or system account may be quite fine.
There should be no difference from a performance standpoint.