What's the syntax for the Dataset "file pattern" text when loading from an SFTP dropzone?

Henry Simms
Henry Simms Administrator
edited October 16 in General

I'm setting up a Dataset that automatically refreshes with data from files dropped into an External System dropzone (SFTP in this case). I only want to load specific files, determined by the file name.

To do this, I was going to use the "Starts with file pattern" config option, but I couldn't find any information about the syntax that was supported.

Are wildcards or regex supported? And can I specify paths to sub-folders here too?

Tagged:

Best Answer

  • Dan Mason
    Dan Mason Experian Employee
    Answer ✓

    From my experience playing around with this, wildcards such as Asterisk and regex are not supported (but I may be wrong!)

    What you enter into 'Starts with file pattern' will literally do what it says on the tin:

    Entering 'CustomerABC' would mean that only files starting with CustomerABC would be loaded.

    Not sure about paths to sub-folders in the dataset settings, but this can be configured in the external system settings:

Answers

  • Hi @Dan Mason I would like to set up an External Server Dropzone (SFTP) - any tips? Thanks

  • Dan Mason
    Dan Mason Experian Employee

    Hi @James T Sidebotham

    If you haven't already, you can create an SFTP external system connection by navigating the following:

    System  > External systems > select add new External System > Define the connection type as SFTP and complete the fields necessary to configure the connection.

    Once set up, you can add a new dataset from your external system connection, selecting your chosen file from the SFTP directory you have configured. When defining the dataset settings, select Enable external system dropzone:

    Each time a new file sharing the same schema is added to the SFTP directory, your dataset will automatically refresh with the new data, either overwriting the existing dataset or adding another batch, depending on your preference.

    For more detail and a step-by-step guide, see user documentation here

    Hope this helps!

  • Hi @Dan Mason I set up my external SFTP system using option 'Root directory' = '\123DZ'. I then go into the add dataset screen and find one of the files in that \123DZ folder (I have multiple files - all same layout, but with different names). I configure the dataset with enable external system dropzone as shown, and I leave the option as-is for 'Folder to use as dropzone' (it shows me '\123DZ/') - the dataset is also multi-batch.

    I can add the data - but if I hit refresh dataset, it just reloads that same dataset again - does not load all the other data sets.

    I was hoping that all files in that \123DZ folder would get loaded automatically only once in the dataset - and that once done, they would be removed from the \123DZ folder. Am I assuming incorrectly?

    Thanks

  • Henry Simms
    Henry Simms Administrator

    Hi @James T Sidebotham

    Only files that are created or modified within the watched "dropzone" folder at some time after the Dataset is configured to watch it will be loaded. From your description it sounds like the dropzone folder contained multiple files before the watcher started, so Data Studio will not pick these up (they have not changed).

    If you try adding a new file to (or modifying an existing file in) the watched dropzone folder, you should see that it is picked up after the next poll (poll interval is configurable, default is one minute). You won't need to click Refresh (which just reloads the existing file), it will happen automatically.

    By design, files are not deleted from the SFTP dropzone folder after load. Data Studio just ignores them and only monitors for changes.

  • James T Sidebotham
    edited October 21

    Ah thanks, that did it - I removed the files and reloaded once the file was set as a watched dropzone folder, and they all loaded nicely. Much appreciated!