Product Updates
Introducing Speakeasy Suggest - Automatic OpenAPI Spec Maintenance
Anuraag Nalluri
September 25, 2023
Speakeasy Suggest
, our AI-powered tool for automatically improving OpenAPI specs, is available now through the Speakeasy CLI or as a Github workflow. Provide your own OpenAI API key, or simply use ours by default. We’d love to hear your feedback in our Slack community (opens in a new tab)!
Stay tuned for more posts on Suggest
output as Github PRs when configuring it as a workflow, interesting results on a variety of specs, and its downstream effects on Speakeasy-generated SDKs!
✨ We plan on open sourcing the core of our LLM-based agent for general-purpose use. Its primary function will be to serve as a JSON document transformer that receives a custom validation function, list of errors, and original document as inputs.
Background
At Speakeasy, we’ve been automating the creation of high-quality developer surfaces for API companies around the world. These include SDKs across multiple languages and even Terraform providers. Today, our generation logic is built on top of OpenAPI, requiring users to supply an OpenAPI schema to effectively use our tooling and generate these surfaces.
Unfortunately, OpenAPI is a complicated specification with lots of potential for misconfiguration and unnecessary repetition. For example, the following schema operations contain duplicate operationIDs (i.e., listDrinks
), duplicate inline schema objects, and no examples.
---/drinks: get: operationId: listDrinks summary: Get a list of drinks. description: Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information. security: - {} tags: - drinks parameters: - name: drinkType in: query description: The type of drink to filter by. If not provided all drinks will be returned. required: false schema: $ref: "#/components/schemas/DrinkType" responses: "200": description: A list of drinks. content: application/json: schema: type: array items: type: object properties: name: description: The name of the drink. type: string type: $ref: "#/components/schemas/DrinkType" price: description: The price of one unit of the drink in US cents. type: number stock: description: The number of units of the drink in stock, only available when authenticated. type: integer readOnly: true productCode: description: The product code of the drink, only available when authenticated. type: string required: - name - price---/drink/{name}: get: operationId: listDrinks summary: Get a drink. description: Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information. tags: - drinks parameters: - name: name in: path required: true schema: type: string responses: "200": description: A drink. content: application/json: schema: type: object properties: name: description: The name of the drink. type: string type: $ref: "#/components/schemas/DrinkType" price: description: The price of one unit of the drink in US cents. type: number stock: description: The number of units of the drink in stock, only available when authenticated. type: integer readOnly: true productCode: description: The product code of the drink, only available when authenticated. type: string required: - name - price
As a result of this complexity, many companies create a spec to benefit from the vast ecosystem of OpenAPI tooling (e.g. docs providers), but they don’t have resources or processes in place to maintain it as a first-class resource.
At Speakeasy, we’ve worked with many users’ specs to ensure they are compliant, clean, and accurately depict the state of their API. By leveraging our OpenAPI expertise and custom spec extensions, we’ve been able to create specs that produce idiomatic and human-readable SDKs for dozens of companies. But wouldn’t it be great if we could automate this? Enter Speakeasy Suggest.
Building Speakeasy Suggest
What is it?
Speakeasy Suggest
is our LLM-based agent that, given an OpenAPI document, automatically suggests fixes, applies them, and outputs the modified spec. Using AI, we’re able to offload the burden of spec management from API producers. Suggest
can be invoked through both the Speakeasy CLI (outputs modified spec to the local filesystem) and our GitHub Action (creates PR) today.
Applying Suggest
on the invalid spec above, we produce a valid document with unique operationIDs, re-use of common objects where applicable, and better examples.
components: schemas: ... Drink: type: object properties: name: description: The name of the drink. type: string type: $ref: "#/components/schemas/DrinkType" price: description: The price of one unit of the drink in US cents. type: number stock: description: The number of units of the drink in stock, only available when authenticated. type: integer readOnly: true productCode: description: The product code of the drink, only available when authenticated. type: string required: - name - price example: name: Old Fashioned type: cocktail price: 1000 stock: 50 productCode: EX123.../drinks: get: operationId: listDrinks summary: Get a list of drinks. description: Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information. security: - {} tags: - drinks parameters: - name: drinkType in: query description: The type of drink to filter by. If not provided all drinks will be returned. required: false schema: $ref: "#/components/schemas/DrinkType" responses: "200": description: A list of drinks. content: application/json: schema: type: array items: $ref: "#/components/schemas/Drink" .../drink/{name}: get: operationId: getDrink summary: Get a drink. description: Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information. tags: - drinks parameters: - name: name in: path required: true schema: type: string example: CocaCola responses: "200": description: A drink. content: application/json: schema: $ref: "#/components/schemas/Drink" ......
For the remainder of this post, we’d like to delve deeper into the decision-making and engineering challenges behind Speakeasy Suggest
. To do that, we’ll start from the beginning.
Good Input = Good Output
Just as a polished OpenAPI spec may be used to generate high-quality SDKs, an LLM requires input with sufficient specificity and detail to produce useful suggestions for a document. It became quickly obvious to us that dumping a spec and naively asking an LLM to “make it better” would not be fruitful.
We found that our in-house validator (i.e. speakeasy validate
from our CLI (opens in a new tab)) would be effective at adding a more specific, deterministic context to our LLM prompt. Our validator is built using vacuum (opens in a new tab), enabling us to specify custom rules. Since these rules are defined by us, they output fixed error messages and line numbers for an array of spec issues. Below is a table highlighting a few of the errors, the suggestions we’d hope to receive from the LLM when including them in the prompt, and explanations of the resulting effects on Speakeasy-generated SDKs when applying those suggestions to the OpenAPI spec.
Validation Error | Suggestion | Explanation |
---|---|---|
duplicate operationId fields within same tag | Change value of offending operationId's | Compiles without method names conflicting to get a functional SDK |
duplicate inline object schemas | Move to shared component | Much shorter and more human-looking class names |
missing examples | Add examples for parameter, request bodies, responses, etc. | Generated usage snippets are more human-looking |
Each of the validation errors above is present in our invalid spec. So let’s examine the generated Go SDK code before and after running Speakeasy Suggest
on the document.
Before
// ListDrinks - Get a drink.// Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information.func (s *drinks) ListDrinks(ctx context.Context, request operations.ListDrinksRequest) (*operations.ListDrinksResponse, error) { ... switch { case httpRes.StatusCode == 200: switch { case utils.MatchContentType(contentType, `application/json`): var out *operations.ListDrinks200ApplicationJSON if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil { return nil, err } res.ListDrinks200ApplicationJSONObject = out ... } ... } return res, nil}// ListDrinks - Get a list of drinks.// Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information.func (s *drinks) ListDrinks(ctx context.Context, request operations.ListDrinksRequest) (*operations.ListDrinksResponse, error) { ... switch { case httpRes.StatusCode == 200: switch { case utils.MatchContentType(contentType, `application/json`): var out []operations.ListDrinks200ApplicationJSON if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil { return nil, err } res.ListDrinks200ApplicationJSONObjects = out ... } ... } return res, nil}
After
// GetDrink - Get a drink.// Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information.func (s *drinks) GetDrink(ctx context.Context, request operations.GetDrinkRequest) (*operations.GetDrinkResponse, error) { ... switch { case httpRes.StatusCode == 200: switch { case utils.MatchContentType(contentType, `application/json`): var out *shared.Drink if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil { return nil, err } res.Drink = out ... } ... } return res, nil}// ListDrinks - Get a list of drinks.// Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information.func (s *drinks) ListDrinks(ctx context.Context, request operations.ListDrinksRequest) (*operations.ListDrinksResponse, error) { ... switch { case httpRes.StatusCode == 200: switch { case utils.MatchContentType(contentType, `application/json`): var out []shared.Drink if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil { return nil, err } res.Drinks = out ... } ... } return res, nil}
The code in “Before” would not even compile due to duplicate method names from the conflicting operationIDs. As such, we wouldn’t even generate the SDK without throwing a validation error but have done so here simply to show the differences in output. Also, notice the stark difference in the generated class names the successful responses are being deserialized to (i.e. ListDrinks200ApplicationJSON
vs. Drink
).
Let’s also examine the differences in the README’s usage snippets your users will reference in their implementation.
Before
...func main() { s := sdk.New( sdk.WithSecurity(shared.Security{ APIKey: "", }), ) ctx := context.Background() res, err := s.Drinks.ListDrink(ctx, operations.ListDrinkRequest{ Name: "Willie Gulgowski DVM", }) if err != nil { log.Fatal(err) } if res.ListDrink200ApplicationJSONObject != nil { // handle response }}...
After
...func main() { s := sdk.New( sdk.WithSecurity(shared.Security{ APIKey: "", }), ) ctx := context.Background() res, err := s.Drinks.GetDrink(ctx, operations.GetDrinkRequest{ Name: "CocaCola", }) if err != nil { log.Fatal(err) } if res.Drink != nil { // handle response }}...
In addition to better method names and more human-readable classnames, the example value for the drink name parameter is much more relevant. This makes the generated SDK easier for end-users to consume and integrate into their applications.
The validation errors we’ve applied suggestions for are just a small sample of the improvements we can make to an OpenAPI spec. By including them in the prompt, we’ve enabled a feedback loop where even non-critical errors that don’t block SDK generation are encouraged to be added if addressing them improves the quality of the SDK output.
Let’s delve into how we receive these suggestions from the LLM in a format that allows us to apply them to the spec.
Structured Output Parser
In our initial implementation of Suggest
, we were only focused on reporting suggestions instead of applying them. Unfortunately, this requires a human in the loop to copy-paste the suggestions themselves. To avoid potential errors and automate the spec management, we needed to be involved in the process of applying the suggestions to our input document as well. This would enable us to revalidate the document upon each applied fix and ensure that overall document correctness was being improved.
Modifying an OpenAPI schema requires us to receive structured output from the LLM that we can apply to the document. We decided to use JSON patch for the format of the suggestion itself. Using langchain-js (opens in a new tab), this involved defining a custom agent with an output parser specifying the zod (opens in a new tab) schema we pass in to ensure we get LLM output in the right format.
// zod schemaexport const SUGGESTION_SCHEMA = z.object({ suggested_fix: z.string().describe( `a simple English description of the suggested JSON patch. For example, the JSON patch [{"op": "replace", "path": "/paths/~1pet/post/operationId", "value": "addPet"}] could be described as "Replace the operationId of the POST /pet endpoint with addPet` ), json_patch: z.string().describe(`suggested fix in the form of a JSON Patch`), reasoning: z .string() .describe( "a brief explanation of why you think the suggested fix will resolve the error" ),});...// output parser which verifies LLM's final response is formatted according to the zod schemaclass JsonOutputParser<T> extends AgentActionOutputParser { lc_namespace = ["langchain", "agents", "custom_llm_agent"]; schema: z.ZodType<T>; constructor(schema: z.ZodType<T>) { super(); this.schema = schema; } async parse(text: string): Promise<AgentAction | AgentFinish> { console.log("\n" + text); const answerMatch = /```json(.*)```/gms.exec(text); if (answerMatch) { try { const json = JSON.parse(answerMatch[1].trim()); // This will throw an error if the json does not match the schema. this.schema.parse(json); return { log: "Got answer", returnValues: json, }; } catch (e) { console.log( "Answer is invalid JSON or does not match schema. Trying again..." ); return { tool: "logger", toolInput: `Error: ${e}`, log: `Final answer is invalid. It does not conform to the requested output schema. Output schema: ${JSON.stringify( zodToJsonSchema(this.schema) )}`, }; } } ... } ...}
This ensured responses were in a form that was easy for us to extract the JSON patch from. However, we soon found that our agent would hallucinate invalid JSON patches that referenced keys that didn’t exist in the OpenAPI document. This required augmenting our agent with another tool to verify the validity of these patches by checking if we could apply them.
export class JsonPatchValidator extends DynamicTool { static description = `Use this to verify that a given JSON patch is valid.Always use this tool to check the validity of your final suggestion.If your suggestion is invalid, use the error message to try to fix it.Do not return anything this tool says is invalid as your answer.This tool will return "valid" if the patch is valid, or an error message if it is not.Example input to this tool: [{{ "op": "replace", "path": "/baz", "value": "boo" }}]Example output: "valid", "error: path /baz does not exist"`; constructor(spec: JSON) { super({ name: "json_patch_valid", func: async (input) => { input = removeQuotes(input); try { apply_patch(spec, input); return "valid"; } catch (e: any) { return `error: ${e}`; } }, description: JsonPatchValidator.description, }); }}
💡For a yaml spec, Speakeasy Suggest
takes extra caution by converting it to JSON before applying the patch and converting it back to yaml while preserving the original key ordering.
We’ll discuss several of the tools we equipped our agent within the next section, where we detail the decision-making process of moving forward with a custom agent.
Benchmarking Different Approaches
There were two main approaches we tested — Custom Agent vs. Retrieval Augmented Generation (RAG). In order to not exceed GPT-4’s token limits, we provide a small range of lines surrounding the error from the spec to the LLM prompt. As a result, both approaches need to be capable of enabling the LLM to gather information from the full spec in order to return valid JSON patches. The primary differentiator is how this is achieved:
- An agent utilizes tools to enable the LLM to execute custom logic to traverse the spec
- In RAG, we use external data to augment our prompts. In our case, this involves the standard method of using a vector store to persist embeddings of the spec, which the LLM can quickly index when trying to answer the prompt.
Agent with Custom Tools
Directly from the langchain docs: “In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.” Each action an agent can take corresponds to one of the tools defined by us that executes a custom function. Since we’re using GPT-4 as our LLM, this takes advantage of OpenAI’s function calling (opens in a new tab). Our tools take the OpenAPI spec as input and are capable of performing a range of operations on it.
Searching for line numbers that contain a particular string
The LLM may need to locate regions of the spec that are not in close proximity to the validation error but contain a specific key.
WARN validation warn: [line 63] oas3-unused-component - `#/components/schemas/Pets` is potentially unused or has been orphanedUsing model gpt-4-0613Running agent...The error message indicates that the schema 'Pets' is potentially unused or has been orphaned. This means that it is not referenced anywhere in the document. To confirm this, I will use the document_text_searcher tool to search for any references to 'Pets' in the document. If there are no references, the schema 'Pets' can be removed from the document. If there are references, they need to be replaced with the correct schema.Action: document_text_searcherAction Input: 'Pets'
From the LLM logs, we can see that the document_text_searcher
tool searches the document for Pets
to see if any usage of this component exists across the schema. The wonderful thing about agents is that the result of this action can be fed as input to other tools before returning a final answer (see get_line_range
below).
Retrieving specific line ranges
This is helpful since our validation errors - and the output of other tools - report the offending line number. The LLM uses this to retrieve the region of the spec it needs more information from or where most fixes are likely to be applied.
Following the output of document_text_searcher
above, the LLM receives appropriate line numbers to conclude its search and validate the final answer.
The 'Pets' schema is referenced on lines 5 and 99. Line 99 is the definition of the 'Pets' schema itself, so it doesn't count as a usage. I need to check line 5 to see how 'Pets' is used. I will use the get_line_range tool to retrieve lines 1-20 for context.Action: get_line_rangeAction Input: '1-20'The 'Pets' schema is not used in lines 1-20. It seems that the 'Pets' schema is indeed unused. I will suggest a JSON patch to remove the 'Pets' schema from the document. I will also validate the JSON patch using the json_patch_valid tool.Action: json_patch_validAction Input: [{"op": "remove", "path": "/components/schemas/Pets"}]The JSON patch is valid. I will now provide the final answer.Final Answer:{ "suggested_fix": "Remove the unused 'Pets' schema from the components.", "json_patch": "[{\"op\": \"remove\", \"path\": \"/components/schemas/Pets\"}]", "reasoning": "The 'Pets' schema is not referenced anywhere in the document, making it unused and can be safely removed."}
Resolving $ref definitions
Many OpenAPI specs contain $ref
keys that point to definitions living in different parts of the document. The LLM should be able to locate the original definition from this when it needs to use it to suggest a fix. This is useful when we want to suggest an example for a component in the OpenAPI spec.
INFO validation hint: [line 104] missing-examples - Missing example for component. Consider adding an exampleUsing model gpt-4-0613Running agent...The error message indicates that there is a missing example for a component. Looking at the surrounding context, it seems that the error is on line 162, which corresponds to the "Pets" schema. This schema is an array of "Pet" objects. To fix this error, we need to add an example for the "Pets" schema. The example should be an array of "Pet" objects. We can use the "Pet" schema to create a valid example.Action: ref_resolverAction Input: #/components/schemas/PetThe "Pet" schema requires an "id" and a "name", both of which are strings. An example "Pet" could be {"id": 1, "name": "Fido", "tag": "dog"}. We can use this to create an example for the "Pets" schema, which is an array of "Pet" objects. The JSON patch to add this example would be [{"op": "add", "path": "/components/schemas/Pets/example", "value": [{"id": 1, "name": "Fido", "tag": "dog"}]}]. Let's validate this patch.Action: json_patch_validAction Input: [{"op": "add", "path": "/components/schemas/Pets/example", "value": [{"id": 1, "name": "Fido", "tag": "dog"}]}]The JSON patch is valid. Therefore, we can suggest this patch as a fix for the error. The patch adds an example to the "Pets" schema, which is an array of "Pet" objects. The example is a valid "Pet" object, as it includes the required "id" and "name" properties.Final Answer:{ "suggested_fix": "Add an example to the 'Pets' schema. The example should be an array of 'Pet' objects. An example 'Pet' object could be {'id': 1, 'name': 'Fido', 'tag': 'dog'}.", "json_patch": "[{\"op\": \"add\", \"path\": \"/components/schemas/Pets/example\", \"value\": [{\"id\": 1, \"name\": \"Fido\", \"tag\": \"dog\"}]}]", "reasoning": "The error message indicates that there is a missing example for a component. The 'Pets' schema is an array of 'Pet' objects, so the example should also be an array of 'Pet' objects. The 'Pet' schema requires an 'id' and a 'name', so the example 'Pet' object includes these properties."}
By equipping the agent with the necessary toolkit to explore the spec JSON, we are able to traverse the document without exceeding the TPM (tokens per minute) limit. However, since each initial query to the agent can chain through a sequence of actions before returning a final answer, we wanted to assess another approach that would limit the number of OpenAI calls.
Retrieval Augmented Generation (OpenAPI Spec Embeddings in Chroma)
We use embeddings to store the spec for quick similarity searches without requiring the full contents to be passed into the prompt, enabling us to respect the token limit. Specifically, we used RetrievalQAChain (opens in a new tab) with the OpenAPI spec embedded in Chroma (opens in a new tab).
Results
We compared the execution time, number of OpenAI calls, and the accuracy of both approaches above. Although RAG seemed to perform fine on *most***** simple errors, complex tasks such as moving duplicate inline schemas to shared components seemed impossible. At first, the returned JSON patch for such a task seems reasonable.
INFO {"error":"validation hint: [line 22] duplicate-schemas - Identical inline object schemas found: [line 22] - [line 31] conflicts with [line 107] - [line 116]. Consider moving to a shared component"}Asking for a Suggestion![{"op": "add", "path": "/components/schemas/Error", "value": {"type": "object", "required": ["code", "message"], "properties": {"code": {"type": "integer", "format": "int32"}, "message": {"type": "string"}}}}, {"op": "replace", "path": "/paths/~1pets/get/parameters/0/schema", "value": {"$ref": "#/components/schemas/Error"}}, {"op": "replace", "path": "/components/schemas/Error", "value": {"$ref": "#/components/schemas/Error"}}]
But upon further review, we found it added the shared component and then swapped out its own schema definition with the $ref
, thinking it was one of the duplicate schemas itself despite having a clear line number to check against. Since RAG can’t gather more context via function calls, it would use a similarity search to suggest fixes for same-name fields on the wrong lines. Due to this behavior, this approach wasn’t sufficient for returning JSON patches with the correct paths.
The agent fared much better on all validation errors ranging across a broad span of complexity. Despite the fact that it took 2-3x as long to execute, the agent’s ability to “reason” through a sequence of actions via the execution of custom functions proved to be critical for good suggestion quality. The ability to equip the agent with any new tool we could think of makes it more flexible and future-proof for our use-case than embedding the OpenAPI schema. This is the most important factor in ensuring Suggest
is a useful product, as OpenAPI specs vary greatly in shape, purpose, and functionality, much like the tools and people who produce them.
Conclusion
Even after settling on the agent approach, suggestion quality, rate limits, and execution time were all variables that still needed to be addressed. Our team has been tackling these roadblocks by implementing parallelization, passing in additional guidance to the prompt for specific validation errors, and benchmarking various LLMs post fine-tuning.
Want to try it for yourself? Head over to the Speakeasy Suggest docs, or just copy paste this workflow file (opens in a new tab) into your own github repo to get started!