What are large language models?

1 min read

Suggest changes in Google Docs

Large language models (LLMs), such as GPT-4 (which powers ChatGPT), are AI models that are initially trained to take a string of text and output how that text could continue. LLMs are typically trained on massive amounts of text from the Internet.

In the course of being trained to predict continuations of text, LLMs have acquired a variety of abilities not explicitly trained for, such as solving math problems, translating between languages, performing basic contextual reasoning, finding mistakes in code, and referencing large sets of data. The performance of LLMs on these tasks tends to improve as the number of parameters in the model increases, with performance on different metrics often improving simultaneously. Modern LLMs achieve human or above-human performance on many metrics.