What does it mean for an AI to have goals?

Merriam-Webster defines goals as the end toward which effort is directed. In Daniel Dennett’s intentional stance, a system has goals if, from the outside, it seems to be reliably acting in order to pursue some state of the world. The existence of such an observation does not imply that one could inspect such a goal-laden AI and find an explicit “goal slot” that specifies that goal in an intelligible way.

An agent is said to be goal-directed if it seems to have goals. Proponents of shard theory would argue that AIs, like other agents, can have multiple, context-activated goals that may conflict in certain situations.

The orthogonality thesis states that if AIs develop goals, these goals might not necessarily follow human goals. If this AI is trained mostly on human data, it seems more probable that the goals it develops could be similar to human goals. There is some debate as to whether this is happening with LLMs.