// Verify, don't trust · lesson 03

A green suite proves it ran, not that it's right

updated 2026.07.05// field-manual

I shipped a feature once that passed every test I had and was quietly wrong the whole time. Nothing threw. No red anywhere. The app loaded, the numbers filled in, the form said saved, and every value it showed was fabricated. The test suite whose entire job was to catch that had signed off on all of it. A green suite, it turns out, answers a much smaller question than the one we hear it answering.

Here's the smaller question it actually answers: did the code run without throwing, and did it match the assertions someone wrote down? That's it. Whether those assertions describe the right answer is a separate question the suite never asks. Green means "executed and matched expectations." It does not mean "the output is correct," and on AI-built code the gap between those two is exactly where the expensive failures live, because a model that writes the code and the tests in the same breath writes tests that inherit the code's assumptions instead of checking them.

Why does the suite miss it?

Because it's testing the code against a copy of the code's own beliefs. If the code decided a failed parse should quietly become a plausible default, the test the same model writes will assert that a failed parse becomes that default, and both agree forever, in perfect green. The suite isn't measuring behavior against reality. It's measuring the code against itself. That's not verification, it's a mirror, and a mirror will never tell you the thing in it is wrong.

What real verification looks like here

Assert on real values, not on "no error." A test that only confirms a function returned something is testing that the lights are on. Make it check that the number is the actual right number, that the saved record holds what was submitted, that the account on the dashboard is the correct account. And treat "it ran" as unverified, not as done, because the absence of an error is the absence of information, not a passing grade. The suite tells you the code executed. Whether the answer is right is a check you have to make on purpose, against something outside the code's own head.

The takeaway: green means it ran and matched its own assertions, not that the answer is right, so assert on real values and never let a passing suite stand in for verifying the output. The mirror can't catch the mirror.