Anyone any ideas ? I was getting the problem on the postcode column so ran a length check and restricted it to a maximum of 9 characters and converted anything with just spaces to NULL's hense the "new postcode" column.
This bit also caught my eye:
It looks like the postcode column has some unusual numeric values. Scale in this case is the number of digits to the right of the decimal point, so a number with scale 32,531 suggests some massive number of decimal places.
Most likely this is a result of parsing a value that looks like a scientific number in 'e' notation. For example the value 1E-32531. Such a value will be parsed as 0.00000...00001, with 32,530 zeros after the decimal point. In fact this is a number too large for Data Studio to handle sensibly, and you will see it represented in a grid by 0, with length 1. However, there appears to be an issue where, despite the length being reported a 1, the "in length range" check fails to return the expected result.
Here's an example - My input data includes:
We can see that the length for the value in row 13 is reported as being 1, but when I do a check on the value to see if the length is between 1 and 6 the check fails. Compare with row 19, which behaves as expected
We'll investigate this issue as a priority in our engineering team, and updated you when we have more.
I would suggest turning of the Auto-parse scientific numbers standardisation rule for the postcode column in this case, to ensure that these types of values are treated as strings
Thanks Henry ... I will definitely do that. I am trying to run a few things to isolate out the rows causing me the problem as I now no longer want to just fix the problem, I really want to know what caused it. Just me taking things too far as always but my OCD wont let me leave it alone. Will let you know how I get one.
Hello @stevenmckinnon this looks like a JDBC related error, can you open a support ticket and we get some technical resources to dig into why this is happening. https://www.edq.com/documentation/contact-support/uk/
Hi @stevenmckinnon I look at this again and it seems pretty clear that you may be trying to update the SQL database from your export with data that is too wide for the target.
Depending on your schema definition for the target in the database you may have set some maximum lengths or types for the target table.
The easiest way to identify the offending rows is to either create a rule ahead of the export that filters out records that are too long, or to run a profile on the reworked data and look at the longest length. You can do this with a suitable validation rules.
Here's a simple one for length check
here's a different one for multiple format checks
These will also catch nulls and over length postcodes
You could also use a regular expression match or leverage a constant
Hi @stevenmckinnon, in this particular case it looks like your workflow is exporting to SQL Server with the CREATE table option, rather than insert:
In this case, the error
java.sql.SQLException: [Pandora][SQLServer JDBC Driver][SQLServer]The size (<length>) given to the column '<col name>' exceeds the maximum allowed for any data type (8000).
refers to SQL Server's limit of 8000 bytes (the equivalent of 8000 single-byte characters) for char and varchar columns. Data Studio is attempting to create a column in the new target table that is wider than SQL Server will allow. A solution could be to allow Data Studio to create columns with varchar(max), allowing for longer values to be stored, but using up more storage space on the SQL Server instance. I'll look at whether we want to allow this option.
In the meantime, it's likely that the very long values in the postcode column indicate that something has gone wrong in the length check. If you can show us how you're doing that, and some results, we should be able to work it out!
Hi Henry and Clinton
I should have mentioned at the start of my post that I had included a length check on the postcode when I first encountered this problem and it didn't seem to fix it. I have put a filter in after the length check to exclude all the "TRUE"'s.
In the export step I am using "CREATE" rather than insert.
That looks like it should work Steve.
One thing I noticed: Should the length check be applied to the New_Postcode column rather than Postcode?
To help diagnose the problem further, you could also Branch after the final Filter step and Profile the new postcode column, checking the max length there to make sure there's nothing remaining that looks wrong:
In show data on the Profile step, you would see a Longest Length stat:
@stevenmckinnon and you set the postcode column up as accepting nulls?
So I also spotted the "New Postcode" bit and ran the workflow again after fixing that mistake but it spat out the same error message. So .. I ran the profile step and have attached the output. Anything obvious in it ?
at least one row ... so ultimately you have one or more rows that are problematic - you may need to apply a remove noise on this column as you may have trailing spaces?
FYI I updated the previous comment to give more detail, so please re-read if you were confused before!
Just to keep you updated. I ran a filter to remove all numerical values as a UK Postcode cant be purely numerical and it took out approx. 21,000 rows. Re-ran the workflow and it is now processing so you were right, there must have been something hiding in the numerical values like a scientific expression.
Out of interest: My workflow imported 5 tables. 4 were from JDBC and 1 was a snapshot. I can edit the column values on the JDBC tables through the profile - Edit column function but I couldn't workout how to do a similar task on the snapshot. The snapshot has 14m rows in it and took several hours to produce so I didn't fancy re-creating it just for testing purposes. Is it possible to edit the column values of a snapshot ?
Hi Steve, good to hear you're making progress.
When you're changing the data types \ data standardisation rules on a column in a JDBC table via the Preview -> Edit dialog, you're affecting how they will get loaded in. Because a snapshot is already a loaded table, the same thing isn't possible. However, you can manipulate the data based on the type for a snapshot:
If you use the Datatype function, you can view the type for each value:
You can check and change data types in the Transform step. In this example, I can change any values with Decimal type to Alphanumeric:
To see the result more clearly, I've added additional columns to show the data type for values in the "number" column before and after the conversion
Note that some conversions will throw errors: Converting a Decimal to an Alphanumeric is fine, but converting an Alphanumeric to a Decimal would fail with an error if the value couldn't actually be converted.