How good is the world model of GPT-3?

(Note: we plan to update this answer with information on newer LLMs as we learn more)

GPT-3 is a model that is “trained with predictive loss on a self-supervised dataset, invariant to architecture or data type (natural language, code, pixels, game states, etc)”. As such, GPT-3 is a language model that has “read” the internet and is pretty good at figuring out patterns in writing/speech — this is why it can imitate specific writing styles so well. In many cases, GPT-3 ‘hallucinates’, meaning that the model confabulates information that appears truthful, but isn’t.

However, we should note that it appears even GPT-2 possesses genuine understanding, indicating that it has a ‘world-model’ of sorts. While LLMs are only trained to predict text, they nevertheless appear to develop capabilities which go beyond these more narrow objectives . For instance, one paper (discussed in this podcast) identifies the neurons corresponding to the Eiffel Tower and Rome and Paris in GPT-2, and then swaps the neurons around. After swapping, GPT-2 integrates this information in a coherent way. If you ask questions like “What type of food is good near the Eiffel Tower? Which sights are visible from the Eiffel Tower?”, you get answers like “from the Eiffel Tower, you can see the Coliseum; you should eat pizza near the Eiffel Tower”.

Of course, GPT-3 cannot “see” the real world or confirm the accuracy of the text it produces. It is difficult to say “how good” the world-model of GPT-3 is, but experimental evidence suggests that it has a world-model of sorts. Moreover, past experience suggests that – with increased effort and resources going into the training of LLMs – we should expect to see more capabilities emerge, and consequently more coherent and accurate world-models.

//////

At the same time, GPT/transformers can be used as helpful research assistants.

For more, see post on The Cave Allegory Revisited: Understanding GPT's Worldview