📑 API paging with the Autonomous REST Connector
In this article I want to share a few approaches to configuring paging when accessing REST APIs using Aperture Data Studio's Autonomous REST (AutoREST) connector.
What is the AutoREST connector?
If you're reading this, you're probably already familiar with the AutoREST connector, but in case you're new to it, check out:
- The AutoREST user documentation
- The "Aperture REST API calls as Datasets" community article, which talks through how to set up an AutoREST connection to Aperture's own REST API to extract metadata not yet available directly from the UI.
What is API paging?
It is impractical for a single REST API call to return huge amounts of data. More commonly, if a large number of records need to be returned, it is split into multiple pages via a set of paginated endpoints.
For example consider this list of Samsung mobile phones from wikipedia. There are 288 phones on the list, but when we use Wikipedia's REST API to query the list, we only retrieve the first 10 records:
- 1st call (page 1):
https://en.wikipedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Samsung_mobile_phones&format=json
- Response:
Any request using this API is automatically limited to 10 responses, and to get the next "page" of 10 records you have to make a different request.
How does API paging work?
Different APIs use paging in different ways, and you'll need to consult the specific documentation for the API you're using to find out how it's done. In the Wikipedia API example above, the next page is returned by passing the gcmcontinue
URL parameter in the API, using the value from the previous page.
From the initial Wikipedia call, the following was returned as part of the response:
"gcmcontinue":"page|2c4c324e4e464403064e2a424e524436042c4c324e4e4644011b01dcbfdcbfdc0a|31138590"
We use the value in the call to get the next page
- 2nd call (page 2):
https://en.wikipedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Samsung_mobile_phones&format=json&gcmcontinue=page|2c4c324e4e464403064e2a424e524436042c4c324e4e4644011b01dcbfdcbfdc0a|31138590
- Response:
The 2nd call returns phones 11-20 from the list, and another gcmcontinue
value that we can use to retrieve page 3, and so on.
How do I page APIs using AutoREST?
Naturally, when using the AutoREST connector you're going to want to page through multiple API calls to return a full dataset, rather than just the first page. This is implemented in the Progress DataDIrect AutoREST connector used in Aperture, and here I'll expend on the connector's paging documentation with some specific examples.
Next page token paging
The Wikipedia example we've been discussing so far uses an approach we'll call next page token paging, where the current page provides the token we need to get to the next page.
Without implementing any paging settings, when we use the Wikipedia API to return the Samsung mobiles phones category, we'll find that only 10 rows are returned in our Aperture Dataset.
To return the full set, we need to specify the following properties in the .rest file:
nextPageParameter
: "gcmcontinue". This tells the connector which URL parameter to use to pass the "next page" token valuenextPageElement
: "continue/gcmcontinue". This tells the connector where to find the value to pass in the nextPageParameter to get the next pagepageSizeParameter
: 10. This specifies how many records will be returned in each call
The complete .rest file entry for this endpoint would be:
{
"Samsung_Phones_paged": {
"#path": ["https://en.wikipedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Samsung_mobile_phones&format=json"],
"query":{
"pages{Integer}":{
"pageid":"Integer",
"title":"VarChar(64)"
}
},
"#nextPageElement": "continue/gcmcontinue",
"#nextPageParameter": "gcmcontinue",
"#pageSizeParameter": 10
}
Page number paging
Another way REST APIs implement paging is using page numbers. An example of this is the Punk API beers endpoint.
- Call:
https://api.punkapi.com/v2/beers
- Response: First 25 beers from a full set of 325
From the API documentation:
Using this information, and Progress's driver documentation, we can establish that the following settings are needed in the relevant .rest file:
- maximumPageSize: 20 (number of records per page. Although the default is 25, we can increase or decrease this). This is a max, because the last page may return fewer records.
- firstPageNumber: 1 (the number of the page to start incrementing from. For some APIs this would be 0 rather than 1)
- pageSizeParameter: "per_page" (the URL parameter to use to pass the number of records per page)
- pageNumberParameter: "page" (the URL parameter to use to pass the page number)
The complete .rest file entry for this endpoint would be:
{
"beers_paged": {
"#path": ["https://api.punkapi.com/v2/beers"],
"id": "Integer,#key",
"name": "VarChar(64)",
"first_brewed": "VarChar(64)",
"description": "VarChar(909)",
"abv": "Double",
"volume": {
"value": "Integer",
"unit": "VarChar(64)"
},
"brewers_tips": "VarChar(301)",
"#maximumPageSize":20,
"#firstPageNumber":1,
"#pageSizeParameter":"per_page",
"#pageNumberParameter":"page"
}
}
Row offset paging
Offset-based pagination APIs work by specifying the number of records to skip, and the maximum number of records to return, in a query.
You can see an example of this in any OData API, where the following pa
- $top to define the number of records to return in the query.
- $skip  to indicate the number of records in the queried collection that are to be skipped (not included in the result)
- $count to indicate that the response return a count of the resources in the response
Aperture itself makes use of an OData API when a Dataset is published. To use pagination for an OData API with the AutoREST connector:
- Add the
$count=true
parameter to the request path. - Add the following paging params to the .rest file:
"#totalRowsElement":"@odata.count",
"#maximumPageSize":100,
"#pageSizeParameter":"$top",
"#rowOffsetParameter":"$skip"
The maximumPageSize
is configurable.
Example .rest files
Example rest files attached for the Wikipedia, Punk API and OData (using an Aperture published data set).