// lesson 05

Reversible by default, gate the irreversible

updated 2026.07.05// field-manual

The way I'm building the very page you're reading, the agent runs almost completely free. It writes files, edits them, runs tests, rewrites its own work, iterates a dozen times, all on a branch. I'm barely in the loop for any of it, because every one of those actions is reversible. There's exactly one place a human has to press the button, and that's the deploy to production. Not because the agent couldn't press it. Because deploy is the one action in the whole chain I can't cleanly take back.

That's the frame this lesson is about: sort every action by whether you can undo it, and spend your ceremony accordingly. Reversible actions get speed and autonomy. Irreversible ones get a gate. Most people get this exactly backwards, either supervising everything (which kills the speed that made AI worth using) or supervising nothing (which is fine right up until the one autonomous action you couldn't undo).

Which actions actually need a human?

The small set that can't be walked back. Deploys. Money moving. Emails and messages sent. Records deleted. Anything a customer or a bank or an inbox now treats as final. Everything else, the branch edits, the drafts, the experiments, the local runs, can and should move at machine speed, because if it's wrong you just roll it back and go again.

This is straight out of the frontier playbook. Anthropic's guidance explicitly prefers reversible actions and reserves human approval checkpoints for the high-stakes, hard-to-undo steps. (Anthropic, "Building Effective Agents") OpenAI's agent guide builds in human-in-the-loop review and halt-and-escalate conditions precisely at the high-risk boundaries, so the agent handles the reversible majority and a person owns the irreversible few. (OpenAI, "A practical guide to building agents")

The failure mode: drawing the line in the wrong place

The mistake that bites is misjudging what's reversible. "It's just an email, we can send a correction" treats an irreversible thing as reversible, and now a wrong message is in a thousand inboxes. Meanwhile people gate the genuinely reversible stuff, sit there approving every file write, and wonder why the agent feels slow. Get the classification right and the workflow almost designs itself. Get it wrong in either direction and you've either got a bottleneck or a time bomb.

There's a bonus in building it this way. The gate can move. Once the reversible work has proven clean over enough runs, you can promote the agent's autonomy one notch at a time, the same way trust actually gets earned. Capability built now, permission granted later, no rebuild required.

The takeaway: speed on what you can undo, ceremony on what you can't, and spend real thought on knowing which is which. The line between reversible and irreversible is the most important line in your whole workflow.

Questions that keep coming up

Can't I just approve everything to be safe? You can, and you'll have rebuilt a slow manual process with extra steps. Approving the reversible stuff buys you no safety and costs you all the speed. Aim the supervision at the irreversible edge.

How do I promote autonomy safely? One boundary at a time, on evidence, on the reversible actions first. Let the agent own a class of action, watch it stay clean for a stretch, then hand it the next. Never jump straight to autonomy on the irreversible ones.

Next: Lesson 6, when it breaks, triage before you rebuild.

If you're building an AI workflow and want help drawing the reversible line in the right place, /work-with-us.