// lesson 01

Spec before you generate

updated 2026.07.05// field-manual

The first time an agent handed me a clean, working feature that solved the wrong problem, I didn't notice for three days. It ran. The tests I'd asked for passed. It just quietly answered a slightly different question than the one I had in my head, and by the time I felt the gap it was wired into four other things.

That's the lesson that reorganized how I build. The fix is boring and it works: before the model writes a single line, I decide what can't drift. Not how it's built. What has to be true when it's done. Prompting is not a spec. A prompt is a wish. A spec is a contract, and the model will honor exactly the contract you actually wrote, not the one you meant.

Why does AI-built code drift?

Because a model can't tell the difference between the part of your request you cared about and the part you threw in because you were typing fast. To the model it's all just tokens. Every place you left ambiguous, it fills with the most probable guess, and probable is not the same as correct. You get code that looks right, reads right, runs right, and quietly encodes three assumptions you never made on purpose.

The danger isn't that it breaks loudly. It's that it works. Working-but-wrong is the expensive failure, because you build on top of it before you find out. Spec-and-generate front-loads the disagreement to the cheapest possible moment: before anything exists.

The frontier teams landed in the same place

This isn't my private superstition. It's what the people building the strongest agents in the world tell you to do, in plain sight.

Anthropic's own guidance for its coding agent recommends a deliberate split: explore and make a plan before writing any code, rather than letting the model jump straight to generation. On non-trivial work the plan-first path measurably beats prompt-and-pray. (Anthropic, "Claude Code best practices")

OpenAI's guide to building agents makes the same point from the other side: vague, underspecified instructions are one of the most common reasons agents behave unpredictably. Their fix is to define the task, the steps, and the edge cases explicitly, up front. (OpenAI, "A practical guide to building agents")

Strip both down and you get the same load-bearing sentence: the human decides what, the model decides how. The whole discipline is refusing to let the model quietly take over the what.

What "spec before you generate" actually looks like

It's lighter than it sounds. I'm not writing a requirements binder. Before a real build I pin six things: what it does, why, what goes in, what comes out, the invariants (the properties that must stay true no matter how the model implements it), and the edge cases that would embarrass me in production. Twenty minutes. Sometimes five.

Then I hand the model the spec and let it cook on the how. That's the half I want it improvising on. This is the line I keep coming back to: spec the invariants, vibe the rest. The invariants are the fence. Inside the fence, the model's faster and often better than me. Outside it, it drifts, and the fence is the only thing that tells me it drifted.

The failure mode nobody warns you about

Over-speccing. The first time this clicks, the temptation is to spec everything, including the how, and now you've handcuffed the one collaborator who's genuinely good at implementation. You'll feel productive and you'll ship slower and worse. The skill isn't writing more spec. It's writing the right spec: pin what actually breaks the thing if it drifts, and deliberately leave the rest open. If you can't name what breaks when a line drifts, that line doesn't belong in the spec yet.

The other failure: speccing in your head. A spec you didn't write down is a vibe. It feels like a plan and it protects nothing, because when the model drifts you've got no artifact to check it against. Written or it didn't happen.

The takeaway: the spec is the one thing that outlives the prompt, the thread, and the rewrite. Write down what can't drift, then let the model move.

Questions that keep coming up

Isn't this just waterfall with extra steps? No. Waterfall specs the implementation and locks it. You're speccing the invariants and leaving the implementation loose on purpose. It's closer to writing a test than writing a blueprint. You're defining what "correct" means, not how to get there.

How much spec is enough? Enough to name what must stay true and what would count as broken. If you can hand it to someone (or something) that's never seen the problem and they can't build the wrong thing without violating a line you wrote, it's enough. Anything past that is you speccing the how, and that's the part you want to give away.

Next: Lesson 2, start simple. Don't build the agent you don't need yet.

If you're formalizing your own methodology after the messy phase and want to compare notes on what holds at production scale, /work-with-us.