The gate that passed without running

2026.06.06// field-notes6 min read

A few days ago I went looking for a bug that turned out to be the absence of one. Our content pipeline has a small rule: no em dashes in the blog posts it writes. It is a brand-voice thing, one of four layers that keep the writing sounding like a person instead of a model. The rule is enforced by a pre-commit hook, a little gate that scans each staged post before it can be committed and rejects the commit if it finds the forbidden character. I had watched that gate pass dozens of times. Green, green, green, every commit. What I had not done was check whether it was actually scanning anything.

It was not.

A green check that scanned zero bytes

The gate was one line of shell. It used grep with a Perl regular expression to hunt for the em-dash codepoint, U+2014, in the staged files. On a normal Linux box that works fine. On the machine I actually run the pipeline from, a Windows POSIX bash, that exact grep invocation does not run at all. It errors out before it scans a single byte, with a complaint about the locale:

grep: unsupported regex engine: supports only unibyte and UTF-8 locales

I never saw that message, because the line ended in 2> /dev/null. Somebody, some past version of me, had piped the error stream into the void to keep the hook output tidy. So the command failed, its complaint went nowhere, and the hook moved on to its next step as if the scan had come back clean. Exit zero. Gate passed. The check that was supposed to be reading every line of every post had been throwing its hands up and getting waved through, commit after commit, for as long as the pipeline had been running on that machine.

This is the kind of thing that does not show up in any failing build, because nothing fails. A test that breaks tells you it broke. A gate that fails open tells you nothing. It reports the same cheerful green whether it caught everything or caught nothing, and the only way to tell the two apart is to go and look at the gate itself, which is exactly the thing a green check trains you to stop doing.

Fail open is worse than no gate

A vibe coder lives on green checks. The whole appeal of vibe coding is that you can move fast because the machinery underneath is watching your back: the hook, the validator, the type checker, the test run. You ship at the surface and trust that the layers below are measuring what you produced. That trust is the productivity. It is also the exposure.

Here is the part worth slowing down on. A gate that is missing is an honest absence. You know the rule is not enforced, so you watch for it yourself. A gate that fails open is a dishonest presence. It tells you the rule is enforced when it is not, so you stop watching, and now the hole is covered by a checkmark you believe. The second situation is strictly worse than the first. I would rather have no em-dash gate and know it than have one that lies to me in green.

That is not a quirk of one shell command. It is a general property of automated enforcement. Every check has two failure modes, not one. It can fail to pass good work, which is annoying and loud and gets fixed fast because somebody is blocked. Or it can fail to block bad work, which is silent and patient and gets discovered, if ever, long after the bad work has shipped. When you write a gate, the second failure mode is the one that actually matters, and it is the one almost nobody designs for, because designing for it means asking the deeply unfun question: what happens when this check itself breaks?

Build the failure surface with the same care as the check

The fix was small. I replaced the grep with a tiny node script that reads the staged file, splits it on newlines, and reports any line carrying U+2014. Node was already a dependency of the hook a few steps later, so this added nothing new. The difference is that the node version cannot quietly evaporate on a locale it does not like. If it hits a real em dash it exits non-zero and names the line number. If the file is clean it stays quiet. Same job, but now the failure mode is a loud block instead of a silent shrug.

The move I care about is not the language swap. It is that I tested both directions before I trusted it again. I staged a draft with a real em dash and confirmed the commit was blocked at the right line. I staged a draft with only ordinary hyphens and confirmed it passed clean, no false alarm. A gate is not verified when it passes a good file. It is verified when it blocks a bad one. Until you have watched it reject something it is supposed to reject, you have not tested the gate, you have tested that it does not crash, and those are very different claims. The first em-dash check passed that second, weaker test every single day.

This is a principle we keep returning to inside the build, the one I think about as building the failure surface first. The happy path gets all the attention because it is what you are trying to make. The failure path gets none, because thinking about it feels like planning to lose. But the failure path is where a system actually lives or dies, and a check is nothing but a small machine whose entire job is the failure path. If you do not specify how it breaks, it will break in the quietest, most expensive way available, which is to keep saying yes. We wrote about a sibling version of this in why a passing test does not mean the app is fixed, and about the layer that almost skipped enforcement entirely in when the build skips enforcement. The thread through all of them is the same surface-versus-semantic line: the green check is a surface signal, and the only thing that disposes is a measurement of what the check actually did.

If your gates have never been tested against a failure

If you are orchestrating AI to build real software and you are leaning on a stack of hooks, validators, and CI checks to keep it honest, it is worth asking when you last watched each one fail on purpose. A vibe-coded pipeline is only as trustworthy as its quietest gate, and the quiet ones are precisely the ones you have stopped looking at. VibeKoded can audit the enforcement layer of an AI build, find the checks that pass without running, and install the spec-first gate discipline that fails loud instead of failing open. Send the current setup and where you suspect the holes are. VibeKoded can scope a gate-audit pass, a spec-first rebuild of the orchestration layer, or a standing enforcement setup that gets tested against the failures it is supposed to catch. Work with VibeKoded.

// part of the spec-first methodology topic

// grab the free starter kit that makes your AI stop forgetting and stop guessing: get it →

// building with AI? the field manual has the structured lessons.

// hitting this on a real build? this is what I fix →

A green check that scanned zero bytes

Fail open is worse than no gate

Build the failure surface with the same care as the check

If your gates have never been tested against a failure

// see also

The commit that shipped to nowhere

When your AI agent can fire itself, build the revert first

A real delegation handoff, spec tight enough to execute cold