Build the agent that knows where to stop

2026.06.20// field-notes3 min read

I built a bot that gives the tour of my own site. It greets you, walks you through what the system does, and here's the part I was nervous about: it fires its own tools while it talks. You don't ask it to pull up the latest posts. It just does it, mid-sentence, because the tour is supposed to feel like someone showing you around, not a menu you operate.

Once an agent can act on its own, the interesting question stops being whether it can act. It's where it stops. A bot that auto-advances through a script and fires its own tools is one config change away from also auto-opening the part of the site where you would hand over money. That is the line I didn't let it cross.

The tour can fire every tool it has. It pulls up posts, it lights up the project map, it carries you from the terminal across to the blog without a click. But there's one tool it fires and then waits on: the one that offers to start a paid engagement. The bot renders the offer. It does not navigate you there. Auto-firing the rest is helpful. Auto-walking a visitor into a commercial intake page they didn't choose to open is the one place proactive turns into pushy, and there is no undo on the feeling that gives someone.

the line was in the spec before it was in the code

That boundary wasn't a judgment I made while coding. It was an invariant I wrote down before the autonomy existed. The tour spec carries a rule that says the handoff tools auto-fire but the commercial one only ever renders an affordance, never auto-navigates. By the time I was wiring the proactive behavior, the code could not drift into auto-opening the close, because the contract had already forbidden it. The restraint was a line in the spec, not a thing I happened to remember.

That's spec-first discipline pointed at autonomy: the invariant comes before the implementation. When you decide where an agent stops after you have already built it, you are patching, and you will always be one excited afternoon away from pulling the patch because the demo feels smoother without it. When you decide it first, the boundary outlives your mood. I've written before about giving an agent standing authority safely, the rate caps and the revert path. This is the sibling move: not just how much power, but which specific actions stay behind a human.

auto-fire the reversible, gate the irreversible

The rule generalizes cleanly, and it's the same instinct behind every guardrail worth shipping. Sort your agent's actions by whether they reverse. Pulling up a post, relighting a map, narrating a feature: reversible, cheap to undo, fine to auto-fire. Starting a transaction, sending the message, opening the checkout, deleting the record: irreversible, or socially irreversible, and those keep a human in the loop by default. The agent can tee them up. It doesn't get to pull the trigger.

I'm usually arguing the other direction. Most of what I build is about removing gates, trusting the system, letting it ship without me standing over it. So it's worth being honest that the same judgment cuts both ways. Calibrating gates isn't a bias toward fewer of them. It's deciding, per action, whether the cost of the agent getting it wrong is bigger than the cost of the friction. For publishing a blog post, the friction loses and I let it run. For walking a stranger into my hire flow, the friction wins and the bot waits. Spec the invariants, vibe the rest: the invariant here is the one place the agent stops.

If you're building a proactive agent and trying to work out which actions it should own and which it should only ever tee up, and you want a sparring partner before it oversteps, work with VibeKoded.

A proactive agent is easy to build. An agent that knows the one button to leave for you is the one people trust.

the line was in the spec before it was in the code

auto-fire the reversible, gate the irreversible

// see also

When your AI agent can fire itself, build the revert first

What real guardrails look like for vibe coding

One decision, traced end to end: the I-AUTOSHIP invariant