API Design
Paginating API responses

Paginating API responses

Pagination is a crucial concept that needs to be understood and designed into a REST API before its built. It is often forgotten about until it’s too late and API consumers are already integrating with the API, so it’s important to get stuck into doing things the right way early on.

What is API Pagination?

At first it’s easy to image that collections only have a few hundred records. That not be too taxing for the server to fetch from the database, turn into JSON, and send back to the client, but as soon as the collection is getting into thousands of records things start to fall apart in wild and unexpected ways.

For example, a coworking company that expected to mostly host startups of 10-50 people, but then Facebook and Amazon rock up with ~10,000 employees each, and every time somebody loads that data the entire API server crashes, along with every application that uses it, and every application that uses that.

Breaking down a large dataset into smaller chunks helps to solve this, and it works a lot like pagination does in the browser: when searching on a functioning search engine like Duck Duck Go or Ecosia, the results are broken down into page 1, page 2… page 34295. It doesn’t just throw every single result into the browser in the worlds longest slowest web response, forcing computer fans to whir until they snap out as it tries to render infinite HTML to the screen.

This is pagination in action, and pagination in an API is exactly the same idea. Much like web pages it is done with query string parameters on a GET request.

GET /items?page=2

The main difference is that the client is not seeing a list of buttons in HTML, instead they are getting metadata or links in the JSON/XML response. How exactly that looks depends on which pagination strategy is picked, and there are a few to choose from with their own pros and cons.

Choosing a Pagination Strategy

To help pick a pagination strategy, let’s look at some examples and talk through the pros and cons.

  1. Page-Based Pagination
  2. Offset-Based Pagination
  3. Cursor-Based Pagination

Page-Based Pagination

Page-based pagination uses page and size parameters to navigate through pages of data.

GET /items?page=2&size=10

This request fetches the second page, with each page containing 10 items maximum.

There are two main ways to show pagination data in the response.

{
"data": [
...
],
"page": 2,
"size": 10,
"total_pages": 100
}

This is pretty common, but forces the client to know a whole lot about the pagination implementation, which could mean some guesswork (which could be guessed wrong), or reading a whole lot of documentation about which bit goes where and what is multiplied by whom.

The best way to help the client is to give them links, which at first seems confusing but it’s just HATEOAS (opens in a new tab) (Hypermedia As The Engine Of Application State), also known as Hypermedia Controls.

It’s a fancy way of saying “give them links for things they can do next” and in the context of pagination that means “give them links to the next page, the previous page, the first page, and the last page.”

{
"data": [
...
],
"meta": {
"page": 2,
"size": 10,
"total_pages": 100
},
"links": {
"self": "/items?page=2&size=10",
"next": "/items?page=3&size=10",
"prev": "/items?page=1&size=10",
"first": "/items?page=1&size=10",
"last": "/items?page=100&size=10"
}
}

Whenever there is a next link, an API consumer can show a next button, or start loading the next page of data to allow for auto-scrolling.

If the next response returns data, it will give a 200 OK response and they can show the data.

If there is no data then it will still be a 200 OK but there will be an empty array, showing that everything was fine, but there is no data on that page right now.

Ease of Use

  • Pro: Simple to implement and understand.
  • Pro: Easy for users to navigate through pages.
  • Pro: UI can show page numbers and know exactly how many pages there are.
  • Pro: Can optionally show a next/previous link to show consumers if there are more pages available.

Performance

  • Con: Involves counting all records in the dataset which can be slow and hard to cache depending on how many variables are involved in the query.
  • Con: Becomes exponentially slower with more records. Hundreds are fine. Thousands are rough. Millions are horrendous.

Consistency

  • Con: When a consumer loads the latest 10 records, then a new record is added to the database, then a user loads the second page, they’ll see one of those records twice. This is because there is no such concept as a “page” in the database, just saying “grab me 10, now the next 10” does not differentiate which records they actually were.

Offset-Based Pagination

Offset-based pagination is a more straightforward approach. It uses offset and limit parameters to control the number of items returned and the starting point of the data, which avoids the concept of counting everything and dividing by the limit, and just focuses on using offsets to grab another chunk of data.

GET /items?offset=10&limit=10

This request fetches the second page of items, assuming each page contains a maximum of 10 items, and does not worry itself with how many pages there are. This can help with infinite scrolls or automatically “importing” lots of data one chunk at a time.

There are two main ways to show pagination data in the response.

{
"data": [
...
],
"meta": {
"total": 1000,
"limit": 10,
"offset": 10
}
}

Or with hypermedia controls in the JSON:

{
"data": [
...
],
"meta": {
"total": 1000,
"limit": 10,
"offset": 10
},
"links": {
"self": "/items?offset=10&limit=10",
"next": "/items?offset=20&limit=10",
"prev": "/items?offset=0&limit=10",
"first": "/items?offset=0&limit=10",
"last": "/items?offset=990&limit=10"
}
}

Ease of Use

  • Pro: Simple to implement and understand.
  • Pro: Easily integrates with SQL LIMIT and OFFSET clauses.
  • Pro: Like page-based pagination this approach can also show next/previous buttons dynamically when it’s clear there are more records available.
  • Con: Does not help the UI build a list of pages if they want to show “Page 1, 2, … 20.” They can awkwardly do maths on the total / limit but it’s a bit weird.

Performance

  • Con: Can become inefficient with large datasets due to the need to scan through all previous records.
  • Con: Performance degradation is significant as the offset increases.

Consistency

  • Con: The same problems exist for offset pagination as page pagination, if more data has been added between the first request and second being made, the same record could show up in both pages.

See this in action

Cursor-Based Pagination

Cursor-based pagination uses an opaque string (often a unique identifier) to mark the starting point for the next subsection of resources in the collection. It’s often more efficient and reliable for large datasets.

GET /items?cursor=abc123&limit=10

Here, abc123 represents the last item’s unique identifier from the previous page, this could be a UUID, but it can be more dynamic than that.

APIs like Slack will base64 encode information with a field name and a value, even adding sorting logic, all wrapped up in an opaque string. For example, dXNlcjpXMDdRQ1JQQTQ= would represent user:W07QCRPA4.

Obfuscating the information like this aims to stop API consumers hard-coding values for the pagination, which allows for the API to change pagination logic over time without breaking integrations. The consumers can simply pass the cursor around to do the job, without worrying about what it actually involves.

It can look a bit like this:

{
"data": [...],
"next_cursor": "xyz789",
"limit": 10
}

To save the client even having to think about cursors (or knowing the name of the query parameters for cursor or limit), links can once again save the day:

{
"data": [
...
],
"links": {
"self": "/items?cursor=abc123&limit=10",
"next": "/items?cursor=xyz789&limit=10",
"prev": "/items?cursor=prevCursor&limit=10",
"first": "/items?cursor=firstCursor&limit=10",
"last": "/items?cursor=lastCursor&limit=10"
}
}

Ease of Use

  • Pro: API consumers don’t have to think about anything and the API can change the cursor logic.
  • Con: Slightly more complex to implement than offset-based pagination.
  • Con: API does not know if there are more records available after the last one in the dataset so has to show a next/previous link which may return no data. (You can grab limit+1 number of records to see if it’s there, but that’s a bit of a hack which could end up being slower. Benchmarks are your friend.)

Performance

  • Pro: Generally more efficient than offset-based pagination depending on the data source being used.
  • Pro: Avoids the need to count records to perform any sort of maths which means larger data sets can be paginated without suffering exponential slowdown.

Consistency

  • Pro: Cursor-based pagination data remains consistent in more scenarios, even if new data is added or removed, because the cursor acts as a stable merker identifying a specific record in the dataset instead of “the 10th one” which might change between requests.

See it in action

Choosing a strategy

Choosing the right pagination strategy depends on the specific use case and dataset size.

Offset-based pagination is simple but may suffer from performance issues with large datasets.

Cursor-based pagination offers better performance and consistency for large datasets but come with added complexity.

Page-based pagination is user-friendly but shares similar performance concerns with offset-based pagination.

Using links instead of putting metadata in the response allows for more flexibility over time with little-to-no impact on clients.

Where Should Pagination Go?

In all of these examples there’s been the choice between sending some metadata back for the client to construct their own pagination controls, or sending them links in JSON to avoid the faff.

Using links is probably the best approach, but they don’t have to go in the JSON response. Instead use the more modern approach: RFC 8299: Web Linking (opens in a new tab).

Link: <https://api.example.com/items?page=1&size=10>; rel="first",
<https://api.example.com/items?page=3&size=10>; rel="next",
<https://api.example.com/items?page=100&size=10>; rel="last"

Popping them into HTTP headers seems like the cleaner choice instead of littering responses with metadata. It’s also a slight performance putting this into headers because HTTP/2 adds header compression via HPAK (opens in a new tab).

As this is a common standard instead of a convention, generic HTTP clients like Ketting (opens in a new tab) can pick this information up to provide a more seamless client experience.

Either way, pick the right pagination strategy for the data source, document it well with a dedicated guide in API documentation, and make sure it scales up with a realistic dataset instead of testing with a handful of records as assuming it scales

Adding or drastically changing pagination later could be a whole mess of backwards compatibility breaks.

Info Icon

NOTE

Pagination can be tricky to work with for API clients, but Speakeasy SDKs can help out. Learn about adding pagination to your Speakeasy SDK.