Snowflake Issues with Dataset Creation
We have deployed the proper snowflake drivers and have successfully connected to a hosted warehouse. Initially we had some sporadic failures connecting, but feel it might be related to the "Auto Close" or "Auto Shutdown" options in Snowflake. We increased the timeout value on the connection string and it seems to have addressed those issues.
However, we are now running into some odd behavior when attempting to create datasets. Sometimes the process hangs during the calls against snowflake to obtain the list of columns. We initially believed it was just for one specific table, but now it appears that is happening for a few / several.
We can see the queries being executed on the Snowflake side and see that they are completing successfully, but Aperture doesn't respond afterwards. The log file doesn't seem to shed any light on significant issues but can share if it helps to troubleshoot.
Wondering if anyone has any tips or recommendations regarding Snowflake external systems that might help to resolve.
Answers
-
Hi George
Can you clarify "the proper snowflake drivers"? also "happening for several" does this mean it is successful for any?
1 -
You can turn on detailed logging for the connection by checking the Debug connections checkbox in the external system's config, which will indicate what queries are being run through the Snowflake driver and any errors.
You may also want to investigate creating your own SQL query to use when loading data into Data Studio. This gives you a lot more control over the volume of data you pull in and help to avoid connection timeouts \ closures if you're loading rows or columns you don't need to process.
2 -
@Josh Boxer , sorry for the delay; for some reason I didn't get an email notification that there were responses to this post.
"Can you clarify "the proper snowflake drivers"? also "happening for several" does this mean it is successful for any?"
We are using snowflake drivers that were downloaded and installed from the vendor's website (Central Repository: net/snowflake/snowflake-jdbc (maven.org)); specifically the 3.13.22 drivers which are being used successfully by other tools to connect to the Snowflake warehouse.
Creating datasets is successful for some tables, but even those which have been successfully loaded in the past, may still sporadically fail.
0 -
@Henry Simms , we do have the "Debug Connections" checkbox clicked but it doesn't seem to give us much in the log to shed light on the issue.
I haven't tried writing our own SQL query, but will do so for many of the tables failing and will report back.
Here are copies of the logs pulled around the time that we saw this behavior. cc @Josh Boxer in case you want to look at these logs.
0 -
@Henry Simms , just to report back regarding writing our own SQL; one of the tables that we have seen more consistent failure, works when we issue a query and select top 1 *, however it hangs when we try to pull all data. It is roughly 4.5k records, which to me is very small compared to some of the datasets we are pulling from other sources.
You can see from the following screen shot, but queries look to have completed successfully from the snowflake warehouse based on their history view, however only the top 1 allowed me to move forward in the dataset creation.
The other one is hanging at the screen that returns metadata and allows me to annotate columns.
0 -
I've been doing some testing with Data Studio and Snowflake. Data Studio actually now (since v2.7.1) includes a pre-packaged Snowflake JDBC driver, meaning you shouldn't need to download your own. Our documentation will be updated to reflect that this is now a native driver. The driver available in Data Studio is from Progress DataDirect and is labelled "Snowflake 6.0" in the DBMS list when configuring and External System:
Using this version of the driver, I've been fetching data from datasets in the SNOWFLAKE_SAMPLE_DATA database. So far I've not hit any problems, when repeatedly loading tables up to 6 million rows (TPCH_SF1. LINEITEM below), which loads into my Data Studio instance in around 2 minutes (my warehouse is in the UK_South Azure datacenter and I'm based in the UK).
When I switch on JDBC debug I get a lot of additional logging in datastudio.log, some example lines logged during a fetch for table "SUPPLIER", with timestamp and thread info removed:
JDBC_DEBUG: meta.getTables(SFSALESSHARED_SFC_SAMPLES_AZUKSOUTH_SAMPLE_DATA, TPCH_SF1, %, [TABLE]) ... JDBC_DEBUG: meta.getColumns(SFSALESSHARED_SFC_SAMPLES_AZUKSOUTH_SAMPLE_DATA, TPCH_SF1, SUPPLIER, null) ... JDBC_DEBUG: SELECT ALL Query: "SELECT "S_SUPPKEY","S_NAME","S_ADDRESS","S_NATIONKEY","S_PHONE","S_ACCTBAL","S_COMMENT" FROM "SUPPLIER""
I would recommend trying out the native driver to see whether it resolves your issues.
However, it is worth mentioning that I also did some testing using the
snowflake-jdbc-3.13.22.jar
driver in Data Studio, and was also successfully able to fetch a range of tables into Data Studio without any issues, so it's possible there's something else going on here.0 -
@Henry Simms, we are on version 2.6.3.62 so the native driver isn't available. We are going to log a request with support via email and will try get someone to help us narrow down the issue.
1