At a recent panel on AI startups, I was asked by an aspiring entrepreneur the extent to which current LLM context window sizes were a limitation on certain LLM use cases and might constrain certain AI startups.
In layman’s terms, “context window size” (also referred to as “context length”) can be thought of as an LLM’s “short-term memory”.
In the case of a chat agent, it constrains the amount of prior conversation history that the chat agent can look back on to figure out what to say next. The smaller the context window, the more “forgetful” of earlier conversation the agent appears to be.
My answer to the question, which the panel was broadly unanimous on, is that the context window size is not a constraint to worry about.
Workarounds
Firstly, there are workarounds to the context window limit which are effective for most use cases.
Extending the above analogy, the common approach here is to shunt the things that need to be remembered to “long-term memory”, perhaps in the form of a traditional database or a vector database, and then call them back into the LLM’s “short-term memory” as needed. This definitely introduces some additional complexity in implementation but it works.
However, the bigger point is that context window size appears to be growing incredibly quickly so it is unlikely to be a constraint, other than in extreme edge cases.
By The Numbers
I wanted to get some specifics here so I collated a table of some of the most well-known LLMs with their release date and context window sizes.
It turns out that this data is surprisingly hard to collate. Firstly, it’s hard to find historical data on the earlier models. Secondly, popular models are frequently revised, and sub-versions released, with increases (and sometimes decreases) in context window size between sub-versions. Lastly, lightweight and/or faster versions of models (“turbo” versions for OpenAI) are released after the original versions, but can have smaller context windows, presumably to increase performance. (You can see the data table at the bottom below.)
With those caveats, here’s a scatter plot of context window size against release date for a bunch of well-known LLMs from Amazon, Anthropic, Google, Meta, OpenAI, and X.ai.
The first thing to note here is that the Y-axis is logarithmic.
Although there are gaps and variances between vendors, it appears that we’ve been seeing a more or less consistent growth over time.
Specifically, in the 6 years from 2019 to 2025, we’ve seen an increase in context window size of roughly 3 orders of magnitude - i.e. a ~1,000x increase.
A New Moore's Law?
In 1975, Gordon Moore, the co-founder of Fairchild Semiconductor and Intel, predicted that the number of components per integrated circuit would continue to double every two years. This prediction came to be known as “Moore’s Law” and has more or less held true until today.
Are we seeing a similar phenomenon now? Will this trend continue?
Is there a new “law” here whereby context window size will continue to grow by an order of magnitude (10x) every 2 years?
Why Does it Matter?
To recap, a larger context window means a longer “short-term memory”.
An example of a simple benefit of a larger context window is the ability to summarize larger amounts of text in a single invocation of the LLM, without having to segment up the data into chunks and then cross-correlate insights across those chunks.
However, in the broadest sense, the larger the context window, the more of an application’s state can be kept in the “short-term memory”. Ultimately, each user’s entire interaction from their very first touch can be held in the LLM’s context window, eliminating entirely the need for any “long-term memory” in the form of separate databases and systems-of-record.
The Cost
Of course, increases in context window size don’t come without costs.
As the amount of data sent to the LLM grows, the computational cost of executing the LLM also grows. Additionally, if we truly want to use LLM state as the source-of-truth and permanent record for an application, then we have to persist that LLM state indefinitely and securely.
But, we can probably bet that these issues will also diminish over time as we see improvements in model architecture, LLM algorithms, and, of course, Moore’s Law itself!
The Data
Model | Initial Release Date | Context Window Size (tokens) |
OpenAI GPT-1 | June 2018 | 512 |
OpenAI GPT-2 | February 2019 | 1,024 |
OpenAI GPT-3 | June 2020 | 2,048 |
OpenAI GPT-4 | March 2023 | 8,192 |
OpenAI GPT-4 32K | March 2023 | 32,768 |
Anthropic Claude 1.2 | May 2023 | 100,000 |
Anthropic Claude 2.1 | July 2023 | 200,000 |
OpenAI GPT-3.5 Turbo | August 2023 | 16,385 |
OpenAI GPT-4 Turbo | November 2023 | 128,000 |
Anthropic Claude 2.1 | November 2023 | 100,000 |
Google Gemini 1.0 | December 2023 | 32,000 |
Google Gemini 1.5 | February 2024 | 1,048,576 |
Anthropic Claude 3 Opus | March 2024 | 200,000 |
Google Gemini 1.5 Flash | May 2024 | 1,048,576 |
Google Gemini 1.5 Pro | June 2024 | 2,097,152 |
Anthropic Claude 3.5 Sonnet | June 2024 | 200,000 |
X.ai Grok 2 | August 2024 | 128,000 |
Google Gemini 2.0 | December 2024 | 1,048,576 |
Meta Llama 3.3 | December 2024 | 128,000 |
Amazon Nova Pro | December 2024 | 300,000 |
OpenAI o3 | January 2025 | 200,000 |
It’s interesting to note here that not all of the vendors are growing their context window at the same rate. Google is being most aggressive whereas, at the other end of the scale, Anthropic has been keeping context window size relatively flat.
I’m guessing this is a deliberate choice vs a limitation in capability. Different vendors have different philosophies on LLM usage and are optimizing for different use cases.
Entrepreneur of joyobi / Joyful, Sustainable Homebuilding | Redefining the Experience of Building Your Own Home @ joyobi
While hardware advancements like packaging and parallel processing provide a clear path forward, the real challenge post - 2028 may lie in data scarcity. Synthetic data offers a potential workaround, but its limitations-such as the risk of model collapse—highlight the irreplaceable value of high-quality, diverse datasets. As context windows expand, we need scalable strategies to ensure that the AI revolution doesn’t outpace its foundational resources.
- Nicola Jones, “The AI revolution is running out of data. What can researchers do?”, nature, 2024
@ Mr Jackdaw Company
Excellent point, and thanks for gathering the data. It mirrors the view (with which I agree) that the future will be ruled by a few highly-specialized LLMs, with cheaper knock-offs that aim to replicate this specialized functionality at cheaper prices.
Side note: I still believe someone should be standardizing an interface for interacting with on-device LLMs, such that e.g. an iOS app developer has a predictable way to connect to [the system-default LLM].