Teaching machines to code

Great documentation is important for humans, but more so for machines. The concept of ‘tiered documentation’ means that both developers and LLMs get what they need.

Teaching machines to code
Hurst Photos / Shutterstock

As important as code is, documentation of that code is arguably more important. No developer, and no software, exists in a vacuum; unless other developers can understand the code you’ve written, it loses much of its potential impact.

But what about machines? Do they also need good docs?

The answer is yes, and points to a future of “tiered documentation,” a term I first saw described by Vlad Ionescu. As he details, tiered documentation means “having one set of documentation for human users and having another set of documentation specifically for LLM [large language model] training.” The former needs to be easily consumed by people; the latter needs to be detailed so that tools such as Amazon CodeWhisperer or GitHub Copilot will yield ever-improving code. It’s a fascinating concept with the ultimate aim of improving developer productivity. So, what do we need to get there?

The importance of great docs

Ask a developer what she needs to be productive, and invariably the answer is “great documentation.” In fact, SlashData has asked that question for years, and docs always top the list:

developerwants SlashData

Good documentation consistently ranks first on developers’ wishlists.

This is, of course, more easily said than done. Despite the fact that we know the importance of docs (e.g., for transmitting knowledge, as developer Jeremy Mikkola posits), it’s invariably the task software developers least want to do. As Kislay Verma notes, writing good documentation is really hard, and not as much fun as writing the code itself.

Well, it just got harder.

For developer Jakub Kočí, “The biggest problem [in writing docs] is clarity.” After all, he continues, “We’re writing code for humans first, not for machines. Making it work is just a half of the solution, making it well-structured and maintainable is another … often more difficult part.” That might have been true in 2022 when Kočí first said that, but in 2024, it’s arguably just as important that machines understand your documentation as much as developers do, given the rise of LLM-driven coding assistants like Amazon CodeWhisperer or GitHub Copilot.

Machines need different documentation than people do—more detailed, for example.

Introducing tiered documentation

As Ionescu suggests, “Tiered documentation is something a few folks are experimenting with as a solution/workaround for LLM code assistants…being dumb because docs are dumb.” Some software companies are trying to solve this by working directly with partners to feed sample code, docs, etc., directly into the LLM. My employer, MongoDB, has done this with AWS. It works but isn’t scalable. Ideally, as a software developer, whether you’re an individual or a corporation, you want to build documentation that LLMs will crawl on their own.

You also need to ensure LLMs will understand your software at a deep level so that they can return the best possible code when developers prompt them. Unfortunately, as Ionescu laments, “Most developer documentation (or even user documentation) is usually written for newbies and that’s now a blocker.” For a person, it’s perfectly appropriate to give quick starts and basic code samples, but feed that kind of limited data to a machine, and it will “struggle to provide serious, production-level code suggestions.”

The idea behind tiered documentation is that “by default, crawling bots for LLMs [will] get super-detailed, in-depth docs, and humans [will] get friendlier docs,” Ionescu summarizes.

That’s the idea. What’s the reality? Well, reality bites, at least for now. To my knowledge, no one has successfully done this, but there’s no reason it can’t be done. It will be tricky to deliver docs that satisfy both humans and machines, but as we figure out the methodologies, the ultimate winner will be developers.

We’re a long way from LLMs being able to spit out code effectively and consistently enough to replace compilers, as O’Reilly’s Mike Loukides argues. But we’re already living in a world where LLMs can assist developers in writing great code. Improving documentation for developers and the LLMs upon which they increasingly depend will be crucial to advancing developer productivity.

Copyright © 2024 IDG Communications, Inc.