API Design
Collections

Returning resources & collections of data

In the context of REST/HTTP APIs, a resource represents a specific piece of data or object that can be accessed via a unique URI (Uniform Resource Identifier). This could be anything: a user, a blog post, a product, or an order. Whereas a collection is a group of resources. It’s a list or set of all the items of a particular type.

Structuring URL paths for resources & collections

Retrieval of resources and collections both use the GET operation. Established convention is to have a unique base URL for each type of resource in your API: /invoices, /transactions, etc. To retrieve the entire collection of resources, you would make a GET request to the base URL: GET /invoices. To retrieve a specific resource, you would call an endpoint for the specific instance into the path: GET /invoices/645E79D9E14, in this case, the ID 645E79D9E14 uniquely identifies a specific invoice.

GET /invoices/645E79D9E14
{
"id": "645E79D9E14",
"invoiceNumber": "INV-2024-001",
"customer": "Acme Corporation",
"amountDue": 500.00,
"dateDue": "2024-08-15",
"dateIssued": "2024-08-01",
"items": [
{
"description": "Consulting Services",
"quantity": 10,
"unitPrice": 50.00,
"total": 500.00
}
],
"links": {
"self": "/invoices/645E79D9E14",
"customer": "/customers/acme-corporation",
"payments": "/invoices/645E79D9E14/payments"
}
}
GET /invoices
[
{
"id": "645E79D9E14",
"invoiceNumber": "INV-2024-001",
"customer": "Acme Corporation",
"amountDue": 500.00,
"dateDue": "2024-08-15"
},
{
"id": "646D15F7838",
"invoiceNumber": "INV-2024-002",
"customer": "Monsters Inc.",
"amountDue": 750.00,
"dateDue": "2024-08-20"
}
]

The resource contains loads of data, including the customer name, an array of items on the invoice, various dates, and how much of the invoice is left to be paid.

It also has “links”, which can be related resources, collections, which could be pure data or could be actions, like a “pay” link which allows you to make a payment, a “send” link which helps you send an invoice, or the one we’ve gone with here “payments”, which still allows you to create a payment, but also supports viewing a list of partial and failed payments.

What is a Collection?

Using the invoices example again, if you wanted the API to let users retrieve all invoices, you would have an /invoices collection:

In JSON this collection is represented with an array, where each item in the list is a representation of a resource.

Usually the API returns some basic information about each resource in the collection, and this example has links so the client can easily load up more data for each resource it’s interested in.

The vast majority of web APIs are built like this, but how can anyone know where the resources are? They could guess, go off searching around the Internet for some API documentation, or you could just… tell them.

How do HTTP Methods fit in?

REST APIs typically use standard HTTP methods to interact with resources and collections:

GET: Retrieve data.

  • /posts - Get a collection of all blog posts.
  • /posts/abc1 - Get a single blog post by its ID.

POST: Create a new resource.

  • /posts - Add a new blog post to the collection.

PUT: Replace an entire existing resource.

  • /posts/abc1 - Update the blog post with ID abc1.

PATCH: Update part an existing resource.

  • /posts/abc1 - Update the blog post with ID abc1.

DELETE: Remove a resource.

  • /posts/abc1 - Delete the blog post with ID abc1.

APIs are about a whole lot more than just CRUD, but when thinking about collections and resources this is a simple way to start thinking about it.

Best Practices

URI Structure

The structure of URIs in REST APIs is crucial for consistency and readability. Here are some common conventions.

Nouns over Verbs: URIs typically use nouns (like /posts) rather than verbs (like /getPosts), because HTTP methods (GET, POST, etc.) already imply the action.

Pluralization: Collections are usually plural (e.g.: /posts), while resources are identified with a unique identifier (e.g.: /posts/abc1).

Minimal Data in Collections: When retrieving a collection, APIs often return minimal information about each resource to save bandwidth and speed up responses. This allows you to quickly scan the collection and then retrieve more detailed information if needed.

GET /posts
[
{
"id": "abc1",
"title": "Understanding REST APIs",
"author": "Bob Doe",
"link": "/posts/abc1"
},
{
"id": "def2",
"title": "Introduction to HTTP Methods",
"author": "Sally Smith",
"link": "/posts/def2"
}
]

There’s plenty of debate about how much detail you should put in your collections.

If you put everything in there and bloat the collections horrendously, wasting time, money, and carbon emissions stressing your servers sending massive JSON payloads around.

If you trim them down to the bare minimum then you force consumers to make more requests to get even the most basic data.

Some even go as far as putting no information at all in their collections because it can all be fetched directly from the resources, which mean if cached data does change, there’s not a strange outcome of having a collection and a resource showing different data.

GET /posts
[
{
"link": "/posts/abc1"
},
{
"link": "/posts/def2"
}
]

There is no one simple answer here, but if you are using a bit of common sense and talking to your consumers, you should be able to find something that works for you.

I generally strike a reasonable middle-ground, where “summary” data is in the collection: name, ID, status, and a few key bits of data that you know from talking to consumers are the most important bits they want access to when they’re building an index of data.

Then if people want more data, they can go fetch it, but it’s up to them. There’s a lot we can do to make this more performant with sensible HTTP caching and better API design, but those are all topics for another guide.

Linking to Related Resources

Collections linking to resources is helpful, letting clients follow various links throughout your API like a user browsing a website, but resources can link to other related resources and collections, which might be data but could also be considered “actions”, all handled through the same conventions.

GET /posts/abc1
{
"id": "abc1",
"title": "Understanding REST APIs",
"author": "Jane Doe",
"content": "This is a detailed tutorial on REST APIs...",
"datePublished": "2023-10-01",
"links": {
"self": "/posts/abc1",
"author": "/authors/jane-doe",
"comments": "/posts/abc1/comments"
}
}

In this response:

  • The self link points to the resource itself, like a canonical URL, which is a handy convention for knowing where something came from even if you’re just seeing a JSON blob of it or its available on multiple URLs.

  • The author link points to the resource representing the author of the post because it’s quite likely you’ll want to load that, but its also going to have its own caching rules and makes no sense to squash that data into the post resource.

  • The comments link points to a collection of comments related to this post if you want to load that, and any application loading that up is going to want to do it after it’s got the post showing to users, so it doesn’t matter if it loads later.

Splitting up API data into multiple endpoints that can be grabbed if needed is really handy, upgrading a REST API from basically a set of functions which grab some data, into an Object-Relational Mapping (ORM) where relationships can be navigated easily, but we can go a step further.

Later articles in the series will show you how to upgrade that ORM to a State Machine, so make sure you subscribe.

Don’t Confuse Resource Design & Database Design A key aspect of API design is not tying your resources and collections directly to the resources being designed. Your database needs to be able to change and evolve rapidly as data structures change, but your API needs to evolve slowly (or not at all), meaning the more tied your API customers are to your internal database structure the more they’re going to have to rewrite their applications.

So, the customer might be showing up in the invoice resource even though its in a separate table, and could be INNER JOIN’ed in the background (for those using SQL). Then if that query starts to get really slow you could reduce a level of normalization and bung that customer name directly into the invoices, which is going to help if the customer changes their name, because then you have a history of invoices with names correct at the time.

There’s lots to think about, but the quick point here is to avoid letting your database design influence your resource design too heavily. Your clients should always come first.

Real-World Examples

GitHub API

When retrieving a list of repositories, each repository item includes a url field that links to the full details of that repository.

GET /users/octocat/repos
[
{
"id": 1296269,
"name": "Hello-World",
"url": "https://api.github.com/repos/apisyouwonthate/Hello-World"
}
]

Twitter API

When retrieving a user’s timeline, each tweet includes a url that links to the specific tweet’s details.

GET /statuses/user_timeline.json?screen_name=speakeasydev
[
{
"created_at": "Wed Oct 10 20:19:24 +0000 2018",
"id": 1050118621198921728,
"text": "Just setting up my Twitter. #myfirstTweet",
"url": "https://api.twitter.com/1.1/statuses/show/1050118621198921728.json"
}
]

Stripe API

Stripe has a collection which is a bit different, instead of returning a JSON array directly in the response, it wraps it in an object with a data property:

GET /v1/charges
{
"object": "list",
"url": "/v1/charges",
"has_more": false,
"data": [
{
"id": "ch_3MmlLrLkdIwHu7ix0snN0B15",
"object": "charge",
"amount": 1099,
"amount_captured": 1099,
"amount_refunded": 0,
"application": null,
"application_fee": null,
"application_fee_amount": null,
"balance_transaction": "txn_3MmlLrLkdIwHu7ix0uke3Ezy",
"billing_details": {
"address": {
"city": null,
"country": null,
"line1": null,
"line2": null,
"postal_code": null,
"state": null
},
"email": null,
"name": null,
"phone": null
},
"calculated_statement_descriptor": "Stripe",
"captured": true,
"created": 1679090539,
"currency": "usd",
"customer": null,
... snip because its HUGE...
}
{...}
{...}
],
}

They do this so they can add in various other bits of metadata, but much of this metadata comes down to pagination which can be handled other ways (like popping pagination into Links headers), so this practice is somewhat dying out.

Best Practices

Returning your resources and collections in a logical and consistent way is tough at first, but there are standards and best practices that can help you avoid common mistakes.

Using a “Data Envelope”

One common convention used by many popular APIs (like the Stripe example above) is to wrap data in some sort of “envelope”, which is a common term for putting it into another object so there’s a bit of room for metadata.

{
"data": [
{
"id": 123,
"name": "High Wood",
"lat": 50.4645697,
"lon": -4.4865975
"created_at": "2022-10-24T12:00:00Z"
},
{
"id": 456,
"name": "Goytre Hill",
"lat": 52.1356114,
"lon": -3.5975258
"created_at": "2024-12-01T09:00:00Z"
}
],
"meta": {
"rate-limit": 100,
"next": "/places?page=2"
}
}

This was really popular for a long time, but we don’t need to do this anymore, because most of that metadata would be better off in a response header.

The move to headers may in part be down to HTTP/2 adding HPAK header compression (opens in a new tab), meaning it is more efficient to use headers for anything that’s sensible to use them for, and more standards are popping up to move these concepts out of custom implementations in JSON and elsewhere, and move them into headers.

For example, instead of putting rate limiting data into meta you can use the RateLimit header (opens in a new tab), and instead of putting pagination data into the response, why not use the Links header.

HTTP/2 200 OK
Content-Type: application/json
Cache-Control: public, max-age=18000
RateLimit: "default";r=100;t=60
Link: <https://api.example.com/places?page=1&size=10>; rel="first",
<https://api.example.com/places?page=3&size=10>; rel="next",
<https://api.example.com/places?page=100&size=10>; rel="last"
[
{
"id": 123,
"name": "High Wood",
"lat": 50.4645697,
"lon": -4.4865975
"created_at": "2022-10-24T12:00:00Z"
},
{
"id": 456,
"name": "Goytre Hill",
"lat": 52.1356114,
"lon": -3.5975258
"created_at": "2024-12-01T09:00:00Z"
}
]

This probably looks easier to work with in some ways, and harder to work with in some ways, but it’s more performant, and any complexity can be deferred to standard libraries which handle it all for you and your clients.

Data Format Standards

Instead of creating your own custom format, it may be easier for you or your users to use an existing “data format” standard.

Using any of these can avoid the “bikeshedding” (arguments about pros and cons of each minor choice), and more importantly it will open the doors to more standard tooling on both the client-side and server-side.

Summary

Use Consistent Naming: Stick to conventions like using plural nouns for collections. It shouldn’t matter, but it drives people mad.

Keep it Simple: Start with basic endpoints and add complexity only when necessary. It’s easier to add things to an API if they’re needed later, than take them away once they’re in production.

API model is not a database model: Do not try and recreate your database model over HTTP because it will be a big waste of time and be almost immediately wrong making clients upset.

By understanding and applying these concepts, you’ll be able to design and work with RESTful APIs effectively, ensuring that your API interactions are intuitive, efficient, and scalable.