We use cookies. Do you accept them?
See more details in our Privacy Policy

Accept Reject
searchclose

PlatformOS

DeckSend

Chat

NEW

LLM Context Window Size: The New Moore’s Law?

At a recent panel on AI startups, I was asked by an aspiring entrepreneur the extent to which current LLM context window sizes were a limitation on certain LLM use cases and might constrain certain AI startups.

In layman’s terms, “context window size” (also referred to as “context length”) can be thought of as an LLM’s “short-term memory”.

In the case of a chat agent, it constrains the amount of prior conversation history that the chat agent can look back on to figure out what to say next. The smaller the context window, the more “forgetful” of earlier conversation the agent appears to be.

My answer to the question, which the panel was broadly unanimous on, is that the context window size is not a constraint to worry about.


Workarounds

Firstly, there are workarounds to the context window limit which are effective for most use cases.

Extending the above analogy, the common approach here is to shunt the things that need to be remembered to “long-term memory”, perhaps in the form of a traditional database or a vector database, and then call them back into the LLM’s “short-term memory” as needed. This definitely introduces some additional complexity in implementation but it works.

However, the bigger point is that context window size appears to be growing incredibly quickly so it is unlikely to be a constraint, other than in extreme edge cases.


By The Numbers

I wanted to get some specifics here so I collated a table of some of the most well-known LLMs with their release date and context window sizes.

It turns out that this data is surprisingly hard to collate. Firstly, it’s hard to find historical data on the earlier models. Secondly, popular models are frequently revised, and sub-versions released, with increases (and sometimes decreases) in context window size between sub-versions. Lastly, lightweight and/or faster versions of models (“turbo” versions for OpenAI) are released after the original versions, but can have smaller context windows, presumably to increase performance. (You can see the data table at the bottom below.)

With those caveats, here’s a scatter plot of context window size against release date for a bunch of well-known LLMs from Amazon, Anthropic, Google, Meta, OpenAI, and X.ai.



The first thing to note here is that the Y-axis is logarithmic.

Although there are gaps and variances between vendors, it appears that we’ve been seeing a more or less consistent growth over time.

Specifically, in the 6 years from 2019 to 2025, we’ve seen an increase in context window size of roughly 3 orders of magnitude - i.e. a ~1,000x increase.


A New Moore's Law?

In 1975, Gordon Moore, the co-founder of Fairchild Semiconductor and Intel, predicted that the number of components per integrated circuit would continue to double every two years. This prediction came to be known as “Moore’s Law” and has more or less held true until today.

Are we seeing a similar phenomenon now? Will this trend continue?


Is there a new “law” here whereby context window size will continue to grow by an order of magnitude (10x) every 2 years?


Why Does it Matter?

To recap, a larger context window means a longer “short-term memory”.

An example of a simple benefit of a larger context window is the ability to summarize larger amounts of text in a single invocation of the LLM, without having to segment up the data into chunks and then cross-correlate insights across those chunks.

However, in the broadest sense, the larger the context window, the more of an application’s state can be kept in the “short-term memory”. Ultimately, each user’s entire interaction from their very first touch can be held in the LLM’s context window, eliminating entirely the need for any “long-term memory” in the form of separate databases and systems-of-record.


The Cost

Of course, increases in context window size don’t come without costs.

As the amount of data sent to the LLM grows, the computational cost of executing the LLM also grows. Additionally, if we truly want to use LLM state as the source-of-truth and permanent record for an application, then we have to persist that LLM state indefinitely and securely.

But, we can probably bet that these issues will also diminish over time as we see improvements in model architecture, LLM algorithms, and, of course, Moore’s Law itself!


The Data

ModelInitial Release DateContext Window Size (tokens)
OpenAI GPT-1June 2018512
OpenAI GPT-2February 20191,024
OpenAI GPT-3June 20202,048
OpenAI GPT-4March 20238,192
OpenAI GPT-4 32KMarch 202332,768
Anthropic Claude 1.2May 2023100,000
Anthropic Claude 2.1July 2023200,000
OpenAI GPT-3.5 TurboAugust 202316,385
OpenAI GPT-4 TurboNovember 2023128,000
Anthropic Claude 2.1November 2023100,000
Google Gemini 1.0December 202332,000
Google Gemini 1.5February 20241,048,576
Anthropic Claude 3 OpusMarch 2024200,000
Google Gemini 1.5 FlashMay 20241,048,576
Google Gemini 1.5 ProJune 20242,097,152
Anthropic Claude 3.5 SonnetJune 2024200,000
X.ai Grok 2August 2024128,000
Google Gemini 2.0December 20241,048,576
Meta Llama 3.3December 2024128,000
Amazon Nova ProDecember 2024300,000
OpenAI o3January 2025200,000

It’s interesting to note here that not all of the vendors are growing their context window at the same rate. Google is being most aggressive whereas, at the other end of the scale, Anthropic has been keeping context window size relatively flat.

I’m guessing this is a deliberate choice vs a limitation in capability. Different vendors have different philosophies on LLM usage and are optimizing for different use cases.

Comments

5 days ago

Entrepreneur of joyobi / Joyful, Sustainable Homebuilding | Redefining the Experience of Building Your Own Home @ joyobi

While hardware advancements like packaging and parallel processing provide a clear path forward, the real challenge post - 2028 may lie in data scarcity. Synthetic data offers a potential workaround, but its limitations-such as the risk of model collapse—highlight the irreplaceable value of high-quality, diverse datasets. As context windows expand, we need scalable strategies to ensure that the AI revolution doesn’t outpace its foundational resources.

- Nicola Jones, “The AI revolution is running out of data. What can researchers do?”, nature, 2024

RO
6 days ago

@ Mr Jackdaw Company

Excellent point, and thanks for gathering the data. It mirrors the view (with which I agree) that the future will be ruled by a few highly-specialized LLMs, with cheaper knock-offs that aim to replicate this specialized functionality at cheaper prices.


Side note: I still believe someone should be standardizing an interface for interacting with on-device LLMs, such that e.g. an iOS app developer has a predictable way to connect to [the system-default LLM].