// Verify, don't trust · lesson 05

Multi-AI consensus, or checking the same trade on three DEXes

When something really matters and I can't afford a confident-wrong answer, I don't ask one model. I ask three, separately, and watch what they do. Claude, ChatGPT, Grok, each with the same question and no knowledge of the others' answers. The pattern in their agreement or disagreement tells me more than any single answer could, because a single answer, however confident, is one source with no way to check itself.

The way I hold it is trading. Any single model's answer is a chart that might have no liquidity behind it. Looks perfect, could be a rug. Running the same question past three independent models is checking that trade on three different DEXes. If all three show the same price, your confidence is earned; the agreement across independent sources is the signal. If they diverge, you haven't wasted the effort, you've found the exact thing to go verify, because the disagreement is pointing straight at the part that isn't solid.

Why does agreement across models mean more than confidence from one?

Because the models are independent in the way that matters: different training, different tendencies, different failure modes. A hallucination is a specific wrong turn in one model's probability space, and there's no reason a different model with different weights would take the same wrong turn on the same question. So when three independent systems land on the same answer, the odds that they all independently hallucinated the identical falsehood are low. Their agreement is evidence in a way one model's confidence never is, because confidence is just the model's internal state and agreement is a fact about the world outside any one of them.

Where this earns its keep, and where it doesn't

Use it for the load-bearing calls: the fact you're about to build on, the number that goes in front of a client, the architectural decision that's expensive to reverse. Don't burn it on everything, because it's slower and three models is three times the cost. The judgment is the same judgment as the rest of verification: spend the rigor where being wrong is expensive. And note the deeper move, which is the point of the whole next lesson: consensus works because it brings in sources from outside any single model's head.

The takeaway: one model's confidence is not evidence, but agreement across independent models is, so for the calls that matter, check the trade on three DEXes and treat divergence as the map to what needs verifying.