JSONL responses in OpenAPI

JSON Lines (JSONL) is a convenient format for storing structured data that may be processed one record at a time. It’s a simple format where each line is a valid JSON value, typically a JSON object or array. JSONL is particularly useful for handling large datasets, streaming data, or log files where each line represents a separate record.

Understanding JSONL format

JSONL (also known as newline-delimited JSON) consists of multiple JSON objects, with each object on a separate line. Each line must be a valid JSON value, and lines are separated by a newline character (\n). More details on the format can be found on JsonLines Docs (opens in a new tab)

Here’s an example of a JSONL file:

{"name": "Alice", "age": 30, "city": "New York"}
{"name": "Bob", "age": 25, "city": "San Francisco"}
{"name": "Charlie", "age": 35, "city": "Chicago"}

JSONL offers several advantages over traditional JSON:

Streaming: JSONL can be processed one line at a time, making it ideal for streaming applications.
Append-friendly: New records can be easily appended to the end of a JSONL file.
Memory-efficient: Processing JSONL doesn’t require loading the entire dataset into memory.
Parallelization: JSONL data can be easily split and processed in parallel.

Defining JSONL responses in OpenAPI documents

JSONL responses can be defined in OpenAPI by using the application/jsonl or text/jsonl MIME type. While JSONL isn’t natively supported in the OpenAPI specification, you can use these content types to indicate that the response will be in JSONL format. Here’s an example of how to define a JSONL response in an OpenAPI document:

openapi.yaml

paths:
  /users/export:
    get:
      tags:
        - Users
      summary: Export user data in JSONL format
      description: >
        This endpoint returns user data in JSONL format, with each line containing a complete user record.
        This format is ideal for large datasets that need to be processed one record at a time.
      responses:
        '200':
          description: User data in JSONL format
          content:
            application/jsonl:
              schema:
                $ref: '#/components/schemas/User'
        '400':
          description: Invalid request
        '500':
          description: Internal server error
components:
  schemas:
    User:
      type: object
      required: [id, name, email]
      properties:
        id:
          type: string
          format: uuid
          description: Unique identifier for the user
        name:
          type: string
          description: User's full name
        email:
          type: string
          format: email
          description: User's email address
        age:
          type: integer
          description: User's age
        city:
          type: string
          description: User's city of residence

In this example, the /users/export endpoint returns user data in JSONL format. Each line of the response will be a valid JSON object representing a user, as defined by the User schema.

Client-side handling of JSONL responses

When working with JSONL responses, clients need to process the data line by line. Here’s an example of how to handle JSONL responses using a python SDK generated by Speakeasy:

from openapi import SDK

with SDK() as sdk:
    res = sdk.users.get_users_export()

    with res as jsonl_stream:
        for event in jsonl_stream:
            # handle event
            print(f"User: {event['name']}, Email: {event['email']}")

In this example, the SDK handles the streaming of JSONL data, allowing you to process each record as it arrives. The context manager (with res as jsonl_stream) ensures proper resource cleanup after processing.

Best practices for JSONL API design

When designing APIs that return JSONL responses, consider the following best practices:

Use appropriate content types

Use the application/jsonl or text/jsonl content type to clearly indicate that the response is in JSONL format. This helps clients understand how to process the response correctly.

Include clear documentation

Provide clear documentation about the JSONL format and how clients should process it. Include examples of the response format and client-side code for handling JSONL data.

Consider pagination for large datasets

Even though JSONL is efficient for streaming large datasets, consider implementing pagination to allow clients to request smaller chunks of data. This can be done using query parameters like limit and offset.

openapi.yaml

paths:
  /users/export:
    get:
      parameters:
        - name: limit
          in: query
          description: Maximum number of users to return
          schema:
            type: integer
            default: 100
        - name: offset
          in: query
          description: Number of users to skip
          schema:
            type: integer
            default: 0