// lesson 04

Make the bad state impossible

updated 2026.07.05// field-manual

On a pricing tool I built for a contractor, the AI never once computes the number that goes in front of the customer. Not ever. The money math is plain deterministic code, the kind that does the same thing every single time. The model helps around the edges, it reads inputs, it drafts language, it flags things. It does not decide the price. That was a deliberate line, and it's the most important line in the whole system.

Here's why, and it's the core of this lesson: a model that's right ninety-nine percent of the time will, given enough runs, eventually put a wrong price in front of a paying customer. And that one time isn't a rounding error, it's the time that ends the relationship. So the rule is simple. If something must never happen, you do not ask the model to be careful about it. You build the system so the bad thing is not possible in the first place.

Why isn't "be careful" good enough?

Because "be careful" is a probability, and you're up against a guarantee. Prompting a model to avoid a mistake lowers the odds. It does not eliminate them. For anything where the failure is unacceptable rather than just unfortunate, lowered odds are not a safety mechanism, they're a countdown. The only real control is structural: make the dangerous action require a permission it doesn't have, or route it through code that cannot produce the bad value.

This is exactly how the frontier teams frame safety. Anthropic's guidance is built on guardrails and least privilege: give the agent the minimum permissions it needs, prefer actions that can't cause harm, and put hard limits in the system rather than in the instructions. (Anthropic, "Building Effective Agents") OpenAI's agent guide stacks layered guardrails and explicit halt conditions for the same reason, so a single confident-wrong output can't reach the thing it could damage. (OpenAI, "A practical guide to building agents")

The failure mode: safety by instruction

The seductive mistake is putting the guardrail in the prompt. "Never delete production data." "Always double-check the total." It reads like a safeguard and it's really just a hope with good grammar. The prompt is the one layer the model can misread, ignore under pressure, or reason its way around. If the constraint only exists in words the model is free to reinterpret, it isn't a constraint.

Put it in the layer the model can't touch. Permissions it wasn't granted. A deterministic function it has to call. A validation gate that rejects the bad state before it lands. The model can be as wrong as it wants and the bad thing still can't happen.

The takeaway: don't ask, enforce. The strongest guardrail is the one that makes the illegal state impossible to represent, not the one that asks nicely.

Questions that keep coming up

Doesn't that defeat the point of using AI? No, it aims it. The AI is genuinely great at the fuzzy, high-variance work, drafting, reading, suggesting. It's the wrong tool for the exact, zero-tolerance work. Use it where variance is a feature and wall it off where variance is a catastrophe.

Where do I draw the line? Anywhere a single wrong output is unrecoverable or unacceptable: money, deletion, sends, permissions, anything a customer sees as a promise. Those get deterministic. Everything else can flex.

Next: Lesson 5, reversible by default, gate the irreversible.

If you're shipping a custom build and want a sparring partner on where to put the hard guarantees, /work-with-us.