// How these things actually work · lesson 07

Context-window economics

updated 2026.07.05// field-manual

A model that was sharp at the start of a session gets vaguer and sloppier as the conversation runs long, and it's not your imagination and it's not the model getting tired. It's economics, and understanding it tells you why tight context beats big context every time.

Two things happen as the context fills. Cost scales quadratically with length, so a conversation twice as long is far more than twice as expensive to process. And more importantly for quality, the more tokens you pack in, the less attention each individual token gets. Attention is a finite budget spread across everything present. A short, clean context gives each token a real share. A bloated one starves them all.

Why does more context make each piece matter less?

Because attention is shared, not stacked. Every token you add is another claimant on the same fixed pool of relevance. Past a point, the signal you actually care about is competing with hundreds of tokens of stale conversation, half-abandoned tangents, and things that mattered an hour ago and don't now. The model isn't ignoring your important instruction on purpose. It's giving it a thinner slice because you made it share the table with too much.

The way I picture it is floor space at the mill. An efficient factory isn't the one with the most equipment crammed in. It's the one where every machine has room to actually operate. Context is the same. A window jammed full is a floor so packed nothing can move.

The move this points to

Two of them. Keep context tight, and treat compression as a feature, not a loss, because fewer tokens means higher signal per token. And when a job is genuinely big, don't cram it into one degraded context. Split it across focused models, each running with a clean, high signal-to-noise window, and make yourself the router deciding what goes where. Multi-AI orchestration isn't just about using different strengths. It's context-window economics, protecting each model's attention by never overloading any single one.

The takeaway: attention is a finite budget that thins as context grows, so tight beats big and splitting beats cramming. This is also why your working memory belongs outside the thread, which is a whole discipline of its own further into the manual.