Fixed Width file parsing in Aperture Data Studio v1 versus v2

Sueann See
Sueann See Experian Super Contributor
edited December 2023 in General

Fixed width files may be one of the common file formats that you have come across and need to process. You will often find that this would be more challenging as compared to a comma separated or Excel file.

Here is an example of a small section in a layout to be submitted to the IRS (Internal Revenue Service) of the United States Government in a fixed width file.

Reference: https://www.irs.gov/pub/irs-pdf/p1220.pdf

The fixed width file would look something like this:

In v1, if you have attempted to upload the file directly, you would find that the file cannot be parsed properly.

With some help from the documentation, you would find out that you are required to have the schema or data structure in a COBOL (.cob) or Data Definition Language (.ddl) file and upload it along with your fixed-width file. This could be challenging for someone without a level of technical expertise.

In v2, this is made much simpler.

Go to Datasets, Add Dataset, Upload File. The Dataset wizard provides a step-by-step guide to uploading a file. 

Browse for the the fixed width (.dat) file. Click Next.

The system can automatically detect and recommend the appropriate Parser, Character set, Language and region. 

Since it is a fixed width file, a metadata file is expected before you proceed to the next step. The metadata file is where you would define the expected columns and provide the annotation. The information you need to define the metadata is often already provided by the supplier or processor of the data as per the IRS example.

The metadata can be populated in Excel, saved and uploaded as a comma separated (.csv) file per the following example:

Once the metadata file has been uploaded, a preview of the file is shown so that you can verify if the columns have been allocated as expected.

Since you have defined the data type, summary and tags in the metadata file, you will see that these values have been populated accordingly as you navigate to the next Annotate columns step. You can also choose to further edit the values on-screen as required.

Click Next to populate details such as the Name and description of the Dataset, then click Finish.

You will find it listed under Datasets.

Clicking on the Dataset name reveals the data that has been parsed appropriately.

How was your experience with fixed width file parsing? What more would you like to see for fixed width file parsing ?

Comments

  • Henry Simms
    Henry Simms Administrator

    Thanks @Sueann See . I've got fixed width data in a .txt file rather than a .dat, so I had to select the correct parser when loading:

    Once I'd done that it worked great. Here's a 4 line file of dummy data in fixed-width format, and the associated metadata file that I used. Love that you can add tags and descriptions as part of the upload.