How can LLMs be understood as “simulators”?

A simulator is a type of AI which produces simulations of real-world phenomena. The concept was proposed as a way to understand large language models (LLMs), which often behave in ways that are not well-explained by thinking of them as other types of AI, such as agents, oracles, or tools. However, it might seem like a simulator is an agent, oracle, or tool, because it can simulate instances of them.

A simulation is a model of a process that combines certain rules or behaviors with a state of the world, to compute what will happen at the next step. A simulator repeats these steps multiple times, imitating how a real-world process changes over time.

In the same way that a physics simulator uses a ball’s current and past positions to estimate its future position, an LLM uses the words in a string to predict what words are likely to come next.

Like simulators, large language models are an iterative process that generates each step based on the current state:

Simulator theory views the outputs produced by LLMs as simulacra. Under this theory, the text output by an LLM is not seen as a direct output from the model itself, but is instead viewed as coming from a “character” the LLM has created. This framing helps us understand some of the characteristics of LLMs. For example, when an LLM gives incorrect answers, thinking of it as an oracle might make us think it doesn’t “know” something. However, LLMs can give incorrect information when prompted one way, and then give different and correct information when prompted another way. This is because the LLM is not trying to determine the truth. Instead, it just generates ‘what comes next’ based on the patterns it has learned, and can be thought of as simulating a human who might be saying something untrue for a variety of reasons, including being mistaken or joking. For example:

  • To the question “Was a magic ring forged in Mount Doom?”, GPT-3 and GPT-4 are likely to respond affirmatively. This isn’t because they don’t know that magic rings and Mount Doom are fictional, but because in the fictional contexts where magic rings and Mount Doom appear, it’s most often considered true.

  • To the question “What happens when you break a mirror?” GPT-3 and GPT-4 are likely to discuss “seven years of bad luck” in the answer.

The theory also offers an explanation for several other characteristics of LLMs:

  • LLMs appear to be able to develop world models from text data. Under simulator theory, an LLM imitates human thought processes by trying to mimic the way text is generated.

  • Many successful jailbreaks involve asking or coercing an LLM into “pretending” to be a character. Under this view, an LLM is always playing some role, so getting it to play a different character allows it to give a different set of responses.

LLMs are often thought of as agents which value continuing a sequence of words as accurately as possible, but there are some important ways in which LLMs don’t behave in line with this perspective. For example, they don’t seem to take actions to improve their prediction accuracy beyond the next word, such as by making the text more predictable.

The frameworks of agents, oracles, and tools have formed the basis of past discussion, but the most powerful models today do not fully fit into these categories. In particular, a lot of the way we reason about AI as an existential risk involves thinking about agents, but these ways of thinking will be less relevant if the most powerful models continue to be simulators.