Context Window

BeginnerGenerative AI

Last updated June 10, 2026

What is Context Window in simple terms?

In simple terms, the context window is how much an AI can keep in mind. Like a desk holding only so many pages, when a conversation runs too long the earliest part slides out of view.

What is Context Window?

The context window is the maximum amount of text an AI language model can take in and consider at one time — everything you've said, everything it has replied, and any documents you've added — measured in tokens, beyond which the oldest material drops out of view.

Every AI language model has a limit on how much text it can hold in front of it at any one moment, and that limit is the context window. Think of everything the model is working with right now — your latest question, the earlier back-and-forth of the conversation, the instructions it was given, and any file or document you've pasted in — all of it has to fit inside this window together. The size is measured in tokens, the small chunks of text that models read and write in, and it varies a lot between models: some can hold only a few thousand tokens, while the largest today stretch to hundreds of thousands or more, enough for entire books. Whatever the size, it is a hard ceiling on how much the model can pay attention to at once.

The most important consequence is what happens when a conversation or document runs past that ceiling: the oldest material drops out of the window and the model itself can no longer see it. This is a big part of why an AI chatbot can seem to "forget" something you told it much earlier in a long chat — those exact words have scrolled out of its view. It's worth separating the raw model from the product built around it, though: many chatbot apps soften this with behind-the-scenes engineering, such as automatically summarizing earlier turns or keeping key instructions aside, so the gist of the conversation can survive even after the original wording is gone. That scaffolding works around the limit rather than removing it — whatever the model actually sees, including any summary standing in for older text, still has to fit inside the window. It's also worth being clear that the context window is not memory in the human sense, and it's not the same as what the model learned during training. Training knowledge is baked in permanently; the context window is more like short-term working memory that is wiped clean at the start of every new conversation. Anything you want the model to use has to be inside the window at the moment it answers.

Because the window is both limited and, on paid services, something you're often charged for by the token, using it well matters. Feeding a model only the relevant parts of a long document rather than the whole thing, summarizing earlier conversation to make room, and putting key instructions where they won't get crowded out are all practical skills. Techniques like retrieval-augmented generation exist partly to work around this limit — fetching just the handful of relevant passages and slotting them into the window instead of trying to cram in an entire library. As context windows have grown larger, models can take on bigger tasks in one go, but the underlying truth never changes: if something isn't in the window, the model can't take it into account.

Real-world example of Context Window

Someone pastes a 60-page legal contract into an AI assistant and spends an hour asking questions about it, clause by clause. Early on the answers are sharp. But as the conversation stretches on and they keep adding follow-ups, the assistant starts giving vaguer responses about the contract's opening sections — and eventually contradicts something it said at the start. Nothing has broken. The combined length of the contract plus the long conversation has overflowed the context window, so the earliest material — including parts of the contract itself — has slid out of the model's view. Starting a fresh chat and pasting in only the specific clause they care about instantly fixes it, because now everything relevant fits inside the window again.

Related terms

Frequently asked questions about Context Window

What is the difference between a context window and a model's memory?

They sound similar but aren't the same. The context window is the text the model can see right now, in this conversation — it's temporary and resets when you start a new chat. A model's "memory" can mean two other things: the knowledge baked in permanently during training, or, in some products, a separate feature that deliberately saves facts about you across conversations. The context window is none of that — it's just the working space holding the current exchange, and once something falls outside it, the model can no longer use it unless you bring it back in.

How does the context window work?

When you send a message, the model bundles together everything in play — your prompt, the earlier conversation, its own past replies, and any attached text — and processes it all at once as a single block of tokens. The context window is the maximum size of that block. As long as everything fits, the model can refer to any part of it when forming an answer. When the total exceeds the limit, the oldest tokens are dropped to make room, which is why long sessions can lose track of how they began.

Why does context window size matter?

It sets how much a model can take into account in one go. A larger window lets you feed in long documents, hold extended conversations, or give detailed instructions without important material falling out of view — which is why bigger windows unlock bigger tasks. But size isn't everything: a huge window can still be filled with irrelevant text that buries what matters, and on most paid services more tokens means higher cost. Using the window deliberately — including the relevant material and leaving out the rest — often beats simply having a large one.