// How these things actually work · lesson 01

Autoregression: the output is the thinking

updated 2026.07.05// field-manual

The thing that reorganized how I prompt was realizing there is no hidden reasoning engine. None. When a model answers you, it isn't thinking somewhere private and then reporting back. The generating IS the thinking, out loud, one token at a time, in front of you.

Here's the mechanism, and it's simpler and stranger than most people assume. The model produces text one token at a time, and each token is a probability-weighted pick across its entire vocabulary. It writes a token, feeds everything so far back into itself, and picks the next one. That's the whole loop. There's no plan sitting behind the words that the words are describing. The words are the plan, forming as they go.

Why does chain-of-thought actually work, then?

Because the intermediate tokens become a scratchpad the next token gets to read. When you tell a model to "think step by step," you're not flipping on a reasoning mode. You're giving it room to write tokens that later tokens can attend to. The reasoning it writes down becomes part of the input for the answer that follows. The thinking has to be on the page to exist at all, because the page is where the thinking happens.

That single fact changes how you work. You stop asking a model to "think harder" silently, because there's no silent place for it to think. Instead you give it room to reason on the page before it commits to an answer. Space to write is space to think. They're the same thing here.

What this changes about how you prompt

Two moves fall out of it. First, when you want better reasoning, ask for the reasoning explicitly, before the conclusion, because the tokens are the compute. A model that answers first and justifies second has already rolled the dice; the justification is decoration. A model that reasons first is actually using the earlier tokens to shape the later ones. Second, you stop trusting confident one-line answers to hard questions, because a hard answer with no visible working had no room to be worked out.

The takeaway: the output is the thinking, not a report on it. Give the model room to think on the page, and read the reasoning as the reasoning, not as an afterthought.