What does it mean for an AI to have goals?

2 min read

Suggest changes in Google Docs

Merriam-Webster defines goals as the end toward which effort is directed. In Daniel Dennett’s intentional stance, a system has goals and can be considered an agent if, from the outside, it seems to be reliably acting in order to pursue some state of the world.

The existence of such an observation does not imply that one could inspect such a goal-laden AI and find an explicit “goal slot” that specifies that goal in an intelligible way.

Proponents of shard theory would argue that AIs, like other agents, can have multiple, context-activated goals that may conflict in certain situations.

The orthogonality thesis states that if AIs develop goals, these goals might not necessarily follow human goals. If this AI is trained mostly on human data, it seems more probable that the goals it develops could be similar to human goals. There is some debate as to whether this is happening with LLMs.

What does it mean for an AI to have goals?

In progress