Retrieving Datasets from an external system using API

Options
Marco_13112001
edited December 2023 in General

Good morning,

We are currently on the beginnings of a project where we need to retrieve data from an external system using API. We have develop an small script (see below) but I would like to get some advise before running it, the idea is to collect the datasets from the external system an import them to a Aperture space

import requests
import json

# Credentials for Zendesk API
zendesk_email = "your_zendesk_email"
zendesk_password = "your_zendesk_password"
zendesk_url = "https://your_zendesk_url"

# Credentials for Aperture Data Studio API
aperture_url = "https://your_aperture_url"
aperture_token = "your_aperture_token"

# Get the list of datasets from Zendesk
zendesk_datasets = requests.get(f"{zendesk_url}/api/v2/tickets.json", auth=(zendesk_email, zendesk_password))

# Iterate through the datasets and upload them to Aperture Data Studio
for dataset in zendesk_datasets.json():
dataset_name = dataset["id"]
dataset_data = requests.get(f"{zendesk_url}/api/v2/tickets/{dataset_name}.json", auth=(zendesk_email, zendesk_password))

# Upload the dataset to Aperture Data Studio
requests.post(f"{aperture_url}/api/v1/spaces/default/datasets", json={"name": dataset_name, "data": dataset_data.content})

Many thanks in advance for your support.

Best Answer

  • Josh Boxer
    Josh Boxer Administrator
    Answer ✓
    Options

    Hi Marco

    The API you mention 'spaces/default/datasets' will list the existing APIs. There is a /create endpoint, but it is looking for a CSV so you would need to transform the zendesk api output or use their CSV exports

    You could build a custom step to request data from a 3rd party API https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/extend-data-studio-functionality/use-the-sdk/

    It is difficult to give further help without really understanding what you are trying to achieve. Zendesk seems to have a few different options to extract data, so unclear why you have settled on API.

    Interested to know what you are doing with the data once it is in Data Studio? If you have not already done so I would suggest manually extracting and uploading data then building out the Workflows etc as the first step

Answers

  • Marco_13112001
    Options

    @Josh Boxer Thanks for the reply. The idea here is to be able to consume the data for analysis and possible cleansing purposes prior to a migration. Also, we want to avoid manual interactions and automate the process as much as we can.

    Just going back to the script and your comment "The API you mention 'spaces/default/datasets' will list the existing APIs", if we change it to "spaces/default/datasets/create" would this work?

    Just to confirm that API will only accept CSV files, is this correct?

    Would an API work with a dropzone i.e. get the dataset through an API and deposits the .Json file in the specified dropzone?

    Many thanks again for your help.

  • Josh Boxer
    Josh Boxer Administrator
    Options

    A dropzone will load file data into an existing Dataset. If the format of the file data is in a JSON format then you may need a parser to be able to load the data successfully. https://github.com/experiandataquality/aperture-data-studio-sdk#creating-a-custom-parser

    FYI - this thread on Autonomous REST APIs might also be interesting, though it also highlights other issues people tend to run into with APIs and large volumes of data