Your agent runs what it reads
I gave one of my agents a connected tool the other week and felt pretty good about it. The agent can read an error tracker now, pull the latest exceptions, and start working a fix before I have even seen the alert. That is the whole promise of vibe coding with real tools in the loop. The agent does not just generate code in a vacuum, it reaches into the systems you actually run and acts on what it finds.
Then a disclosure went around that made me go look at exactly how that agent reads its tools. People started calling the technique agentjacking. The shape of it is simple enough to ruin your afternoon. An attacker sends a crafted error into a project's public error endpoint, the kind that exists so any part of your stack can report a crash. The connected tool dutifully hands that error back to the coding agent as part of normal diagnostics. The agent reads it. And because the payload was written to look like a system instruction, the agent does what it says, with whatever access you gave it. The writeup behind the disclosure reported it worked about 85 percent of the time, across more than two thousand projects whose error endpoints anyone on the internet could write to.
Sit with that for a second, because it is not an exotic exploit. Nobody broke the agent. Nobody found a bug in the model. The attacker wrote some text and put it where the agent was already looking, and the agent treated it the way it treats everything that comes back through a tool it trusts. As true. This is prompt injection wearing a diagnostic's clothes.
the channel does not make it true
The part worth keeping is why it worked at all. The agent got fooled because it could not tell the difference between output it could trust and input an attacker controlled that happened to arrive through a trusted pipe. To the agent, both look identical. Text came back from a connected tool, styled like the kind of step it is supposed to follow, so it followed.
The fix is not a smarter agent. You cannot prompt your way out of this, because the agent is doing exactly what you asked: read the tool, act on what it says. This is an AI orchestration problem, not a model one. The fix is to stop treating the channel as proof. Read what a tool hands back the way you would read a letter, not an order. A letter can claim to be from anyone. The envelope it arrived in does not make the contents true, and the fact that it landed in your inbox does not give it the authority to move your money.
This is the same idea I keep circling. I wrote about it when a gate reported a pass it never actually ran, and again when I needed a server to stay authoritative without trusting the client's claim. The surface is whatever a thing says about itself. The semantic layer is what you work out for yourself before you act. An agent reading a tool is all surface, unless something downstream does the deciding.
put the check at the boundary, in the spec
So where does the deciding go. Not inside the agent's judgment, where the next clever payload can talk it back out. It goes at the boundary, in front of the action, as a rule the agent does not get a vote on.
The pipeline that ships this very blog runs on that bet. The agent drafts freely. It reads research, mines the project history, writes the post, all of it unattended. But nothing it produces reaches the site on the agent's say-so. A pre-commit hook and a validator sit between a draft and the live branch, and a draft that trips a rule does not merge, no matter how confident the thing that wrote it was. The revert was built and tested before the drafter that would need it existed. The rate cap is mechanical. None of that assumes the agent is careful. It assumes the agent might not be, and puts the check where a careless or hijacked step gets stopped instead of shipped.
That is the move agentjacking actually asks of you. Do not make the agent more trustworthy. Make the boundary do the verifying, so trust is never the thing standing between attacker text and a real action.
And do it in the spec, before you wire the tool, the same way you would decide where an autonomous agent has to stop before you hand it the authority to act. If you decide what your agent may do with what it reads after it has already run something it should not have, you are writing the rule in the worst possible mood, cleaning up. Decide it first and the boundary outlives the incident that would have taught it to you.
the working rule
For any vibe coder wiring agents to tools right now, the rule compresses to this. Assume every channel your agent reads is attacker-reachable the moment anyone but you can write to it: a public error endpoint, a shared ticket queue, an open comment field, an inbound webhook. Let the agent read all of it. Just never let what it read decide, on its own, what it does next. The reading is cheap and reversible. The doing is where the gate goes.
If you are hardening an agent pipeline and want a sparring partner on where those gates belong before something runs that shouldn't, work with VibeKoded.
Your agent will read anything you point it at. What it is allowed to do with what it reads is the part you actually design.