JSONL responses in OpenAPI
JSON Lines (JSONL) is a convenient format for storing structured data that may be processed one record at a time. It’s a simple format where each line is a valid JSON value, typically a JSON object or array. JSONL is particularly useful for handling large datasets, streaming data, or log files where each line represents a separate record.
Understanding JSONL format
JSONL (also known as newline-delimited JSON) consists of multiple JSON objects, with each object on a separate line. Each line must be a valid JSON value, and lines are separated by a newline character (\n
). More details on the format can be found on JsonLines Docs (opens in a new tab)
Here’s an example of a JSONL file:
{"name": "Alice", "age": 30, "city": "New York"}{"name": "Bob", "age": 25, "city": "San Francisco"}{"name": "Charlie", "age": 35, "city": "Chicago"}
JSONL offers several advantages over traditional JSON:
- Streaming: JSONL can be processed one line at a time, making it ideal for streaming applications.
- Append-friendly: New records can be easily appended to the end of a JSONL file.
- Memory-efficient: Processing JSONL doesn’t require loading the entire dataset into memory.
- Parallelization: JSONL data can be easily split and processed in parallel.
Defining JSONL responses in OpenAPI documents
JSONL responses can be defined in OpenAPI by using the application/jsonl
or text/jsonl
MIME type. While JSONL isn’t natively supported in the OpenAPI specification, you can use these content types to indicate that the response will be in JSONL format.
Here’s an example of how to define a JSONL response in an OpenAPI document:
paths:/users/export:get:tags:- Userssummary: Export user data in JSONL formatdescription: >This endpoint returns user data in JSONL format, with each line containing a complete user record.This format is ideal for large datasets that need to be processed one record at a time.responses:'200':description: User data in JSONL formatcontent:application/jsonl:schema:$ref: '#/components/schemas/User''400':description: Invalid request'500':description: Internal server errorcomponents:schemas:User:type: objectrequired: [id, name, email]properties:id:type: stringformat: uuiddescription: Unique identifier for the username:type: stringdescription: User's full nameemail:type: stringformat: emaildescription: User's email addressage:type: integerdescription: User's agecity:type: stringdescription: User's city of residence
In this example, the /users/export
endpoint returns user data in JSONL format. Each line of the response will be a valid JSON object representing a user, as defined by the User
schema.
Client-side handling of JSONL responses
When working with JSONL responses, clients need to process the data line by line. Here’s an example of how to handle JSONL responses using a python SDK generated by Speakeasy:
from openapi import SDKwith SDK() as sdk:res = sdk.users.get_users_export()with res as jsonl_stream:for event in jsonl_stream:# handle eventprint(f"User: {event['name']}, Email: {event['email']}")
In this example, the SDK handles the streaming of JSONL data, allowing you to process each record as it arrives. The context manager (with res as jsonl_stream
) ensures proper resource cleanup after processing.
Best practices for JSONL API design
When designing APIs that return JSONL responses, consider the following best practices:
Use appropriate content types
Use the application/jsonl
or text/jsonl
content type to clearly indicate that the response is in JSONL format. This helps clients understand how to process the response correctly.
Include clear documentation
Provide clear documentation about the JSONL format and how clients should process it. Include examples of the response format and client-side code for handling JSONL data.
Consider pagination for large datasets
Even though JSONL is efficient for streaming large datasets, consider implementing pagination to allow clients to request smaller chunks of data. This can be done using query parameters like limit
and offset
.
paths:/users/export:get:parameters:- name: limitin: querydescription: Maximum number of users to returnschema:type: integerdefault: 100- name: offsetin: querydescription: Number of users to skipschema:type: integerdefault: 0