Support for VARBINARY(MAX) field type in ADS

Hello I have a user case that requires migrating Salesforce objects that contain attachments such as HTML Body, PDF's, Word, Excel and jpg's attachments to another instance of Salesforce. These are normally stored as a VARBINARY(MAX) in a MS SQL database and I think would be a Blob in SQL Lite. Given these data types are not supported in Aperture Data Studio ADS has anyone heard of a workaround using the SDK, custom exporters, pushdown or other hacks so that the data is stored in underlying SQL Lite DB and can be exported to a target without having to be part of a dataset. No processing, transformations or sorting are required on the field. Thanks.

Answers

  • Henry Simms
    Henry Simms Administrator
    edited November 2024

    Data Studio doesn't currently have a "BLOB" type, so any data imported from this type of field is treated as ALPHANUMERIC.

    Data Studio's maximum value size is 32k bytes and most attachments will be larger than this. As a result, when loading attachment contents from the VersionData field in the ContentVersiontable, we see the max value size at play for the BLOB data itself (VersionData field), which is truncated at 32,766 bytes (the reported ContentSize in SF is 164,397 for the highlighted row):

    Much smaller attachments (rows 1,3,4) can be loaded successfully.

    In future there may be an option to increase the maximum size, or do some compression, for certain values where we don't intent that any operation would be performed on the value - In this case you'd just want to pass the LOB right through from source to target. However this is not currently possible.

    Henry

    Appendix - Notes on how Salesforce stores attachments:

    Enhanced Files Model: Salesforce has gradually moved from the Attachment model to a newer, more flexible "Files" model. This means that attachments will now be stored in the ContentVersion and ContentDocument objects instead of the Attachment object. To fetch files, you'd typically query these tables:

    • ContentDocument Represents a document that has been uploaded to a library
      • ContentDocument is the parent object of both ContentVersion and ContentDocumentLink
      • ContentDocument records can’t be inserted or updated directly - The document is created by Salesforce when a ContentVersion record is inserted and its ContentDocumentId field is null
    • ContentVersion: This represents a specific version of the document.
      • For ContentVersion, you can retrieve the file content through the VersionData column, which is also in BLOB format.
    • ContentDocumentLink: This object links (describes the relationship between) the document and the parent object (e.g., Group, Case, Opportunity, etc.). Represents where it is shared
      • A document can be (usually is) shared with multiple objects, eg a Case and a User
      • You can't run a query without filters against ContentDocumentLink.
      • A SOQL query must filter on one of Id, ContentDocumentId (get all linked objects for a doc), or LinkedEntityId (get all linked docs for a given object)
      • This may represent a challenge for Data Studio, as it makes it difficult to associate objects with Content Documents in bulk

    Selected references and sources:

    - https://help.salesforce.com/s/articleView?id=experience.collab_salesforce_files_parent.htm&type=5

    - https://salesforce.stackexchange.com/questions/234279/how-to-query-contentdocumentlink-in-soql-and-then-upsert-records-in-data-loader

    - https://appiphony.com/blog/contentdocument-contentversion-and-contentdocumentlink/

    - https://www.sfapex.com/how-to-copy-content-objects-between-salesforce-environments/