Wednesday, May 1, 2024

DeepMind researchers discover impressive learning capabilities in long-context LLMs - Ben Dickson, Venture Beat

In a few years, large language models (LLMs) have gone from handling a few hundred words of input to several books’ worth of content at the same time. These expanded input capacities, also referred to as the “context window,” are enabling new applications and use cases that were previously impossible without extensive engineering efforts. A new study by researchers at Google DeepMind explores the “many-shot” in-context learning (ICL) ability of LLMs that have very long context windows. Their findings show that by fitting hundreds or even thousands of training examples in the prompt, you can improve the model’s abilities in ways that would previously require fine-tuning.