What are large language models?

Large language models (LLMs), such as GPT-4 (which powers ChatGPT), are AI models that are trained to take a string of text as input and to output a likely continuation of that text. LLMs are primarily trained on massive amounts of textual data, typically taken from the Internet.

In the course of being trained to predict the continuation of strings of text, LLMs have acquired a variety of abilities not explicitly trained for, such as solving math problems, translating text, performing basic contextual reasoning, finding mistakes in code, referencing large sets of data, and so on. The performance of LLMs on these tasks tends to improve as the number of parameters in the model increases, with performance on different metrics often improving simultaneously (see the chart below). Modern LLMs commonly achieve human or above-human performance on many metrics.

Increased performance of LLMs with increased size of the model.[1]


  1. Jason Wei et al, CC BY 4.0, via Wikimedia Commons ↩︎