Turning AI Into an API Documentation Assistant

May 22, 2025

Turning AI Into an API Documentation Assistant

API documentation is generally predictable, follows common patterns, and is one of the least interesting tasks in a documentation project. It's also a task with a degree of pre-existing automatic generation tools and practices.

It sounds like a perfect use case for AI-assistive tools!

In this post, I look at general and specialized tools for generating API documentation from code and text-based prompts. I also cover potential problems and pitfalls in generated docs and how to test the generated output to ensure its accuracy.

You have probably already heard of many of the tools and models I cover, such as flavors of OpenAI's GPT or Claude running in GitHub CoPilot, but even with a relatively simple task, each of them returns quite different results. I also look at running local models and see how they compare. Finally, EkLine has a new feature that takes everything I show a step further.

Example setup

I found an old ExpressJS-based book catalogue application, forked it, removed any code comments, and left the dependencies outdated. Every time I tested a new model, I reset any previous chat history. When using a VSCode plugin to test a model, I ensured that no files were open and deleted any API spec files generated in a previous step. I wanted to ensure that each model received as little context as possible.

I took the same approach to the prompts I used. I kept them as simple and minimal as possible, expanding and clarifying them when needed to see how each model would provide a better result.

GPT 4o

With the project folder open in VSCode, I provided the following prompt to the Copilot extension:

Generate an API spec for this code

You can see the response in this blog post's GitHub repository.

I then took that file and imported it into RapidAPI, my API prototyping tool of choice (any will do), and ran it against the application running locally. Everything worked as expected.

But any good docs writer knows that working doesn't always mean "best". So, I added and enabled the VSCode Spectral extension for linting API specs and received a flurry of warnings. Technically, this is fine. Nothing in the prompt specified that the response should follow any best practices. You could improve and expand the prompt, but I am unsure to what extent you would have to go, as I tried some experiments asking to follow best practices or linking to style guides that Spectral uses, but nothing changed the spec returned.

Claude 3.5 Sonnet

I switched Copilot's model to Claude and used the same setup and prompt. Based on my experience, Claude is typically more verbose with its responses, which seems to mean it also expects slightly more explicit prompts.

So:

Generate an API spec for this code

Instead, gave me a fairly well-explained series of API endpoints but not an actual spec file. Changing the prompt to:

Generate an API spec file for this code

Instead got me the response I expected. You can see the response in this blog post's GitHub repository. The response had a lot more detail. First, I tested it again in RapiAPI, and everything worked, but this time, there was also example data prefilled in the payloads for each endpoint. Opening the file with Spectral I still see warnings, but far fewer of them, and they are fields such as `operationId` and `description`, which I was able to resolve with the following prompt:

generate an api spec file for this code, make sure the description and operationid fields have meaningful values

The results from Claude seem more considered and thorough, but I was still unable to get it to follow any kind of style guide without manually mentioning everything. I tried including links to style guides and mentioning them by name, but nothing really changed the output.

Local with Ollama and Mistral

There are various reasons why you might want to run a machine learning model and pass data through it locally, be it privacy, cost, technical flexibility, or investigation. There are myriad ways to run and use local models and myriad models to choose from. Be warned that some models can be tens of gigabytes in size. Do your research!

For this example, I use Ollama with the mistral model. First, install Ollama and then pull and run the model with the following command:

ollama run mistral

This opens a command prompt in the terminal, highlighting one of the negatives of running models locally. How do you interact with it, providing context and integration into your IDE? Unsurprisingly, there are a few options. Regarding the use within VSCode, so far, I have found Continue to be the most effective. The interface looks a lot like the Copilot interface, and you can choose the Ollama-hosted Mistral model from the bottom of the chat pane in much the same way.

You need to give the extension (or Ollama and the model) more context than the Copilot extension. It won't infer anything from open files by default, but it does have some placeholder tags you can use to tell the extension where to look. Use `@Codebase` to analyze all open files:

Generate an API spec file for this code @Codebase

You can see the response in this blog post's GitHub repository.

Initially, it looked great, but the response to each attempt varied widely, ranging from non-functioning to hallucinated paths. After refining the prompt, you end up with a spec that's quite similar to that from GPT4, in that it works but is minimal.

I could improve the response in similar ways with the following prompt:

generate an api spec file for this @Codebase , make sure the description and operationid fields have meaningful values and don't use duplicate keys

Perhaps larger local models would require less fine-tuning.

Providing even less

At this point, I started to wonder that if an API spec is so predictable, maybe I could get an API by providing just a JSON object such as the one inside _server/book-api.js_:

[{

"isbn": "9781593275846",

"title": "Eloquent JavaScript, Second Edition",

"author": "Marijn Haverbeke",

"publish_date": "2014-12-14",

"publisher": "No Starch Press",

"numOfPages": 472,

},

{

"isbn": "9781449331818",

"title": "Learning JavaScript Design Patterns",

"author": "Addy Osmani",

"publish_date": "2012-07-01",

"publisher": "O'Reilly Media",

"numOfPages": 254,

},

{

"isbn": "9781449365035",

"title": "Speaking JavaScript",

"author": "Axel Rauschmayer",

"publish_date": "2014-02-01",

"publisher": "O'Reilly Media",

"numOfPages": 460,

}]

So, I tried each model again with only this as the input into the prompt to see if it worked or if the output was much different. I used "Create an API spec based on this JSON" as the base prompt, providing the JSON.

You can find the outputs in the blog's repository. In summary, almost all the responses were the same, apart from the locally running llama, which only returned a spec with an endpoint to return all books. Providing a follow-up prompt to add all "CRUD operations" fixed that.

Combining inputs to get the "best" spec

Almost in every case highlighted in this post, the AI has done a reasonable job of creating a technically functioning specification, with varying degrees of conformity to a hypothetical style guide or set of best practices. However, none allowed you to specify a style guide, or at least, they didn't allow you to specify them in a standard, codified way. You had to spell every requirement out in human language. This is fine, but tedious and not how most modern development teams work, especially when tools and standards for defining API already exist.

This is where the EkLine docs reviewer steps in. You can define a style guide, and the AI behind the EkLine reviewer incorporates it when you're working with spec files in EkLine tools. Even better, it goes beyond API specification style guides, and when working with values in longer text fields such as description, it also takes into account your broader English language style guides, too, bringing cohesive writing across even more touchpoints with users and customers.

Take for example:

…

paths:

/books:

get:

summary: Get a list of books

responses:

'200':

description: A list of books

content:

application/json:

schema:

type: array

items:

$ref: '#/components/schemas/Book'

…

Docs reviewer uses an API style guide, defined with Vacuum in this case, and suggests you specify `maxItems` based on that style guide. It also suggests a better value for `description` based on the API style guide and the language style configuration in the EkLine dashboard.

For example:

Retrieve book details using the ISBN. Requires a valid ISBN as a path parameter. Returns book information in JSON format if found, or a 404 error if the book is not available.

Making the unreliable predictable

If you try any of the prompts above, you may not get the same results as me. This is the first problem I found. None of the models gives consistent answers to the same prompt, which is somewhere between a problem and an annoyance. Whether this is due to model or tooling updates, or the general inconsistency of models, who knows, but your mileage will definitely vary. Tools like EkLine that actually factor in API and English style guide rules in combination with generative AI tools can help tame the powerful but unpredictable into something consistent, making AI reliable and predictable. Just the way APIs are supposed to be.

‍