What is instrumental convergence?

3 min read

Suggest changes in Google Docs

Instrumental convergence (sometimes called "basic AI drives"¹) is the thesis that intelligent agents with a wide variety of terminal goals would share certain instrumental goals.

A terminal goal is something valued for its own sake. An instrumental goal is something pursued as a means to achieving a terminal goal.

For example: you might donate money to help the poor. If you learned your donations weren't actually helping anyone, you'd stop. This reveals that "donating" was instrumental — a means to your terminal goal of "improving people’s well-being."

Some instrumental goals are specific to particular terminal goals. If you're thirsty, you fill a glass with water. If you're not thirsty, that action is useless.

But other instrumental goals would be useful for achieving many different terminal goals. Agents with very different terminal goals would "converge" on pursuing these same instrumental goals.

Consider an AI whose only terminal goal is to maximize paperclips. Bostrom names 5 examples of instrumental goals that such an AI would likely pursue:

Self-preservation. A destroyed AI cannot make paperclips. As Stuart Russell put it: "You can't fetch the coffee if you're dead."
Goal-content integrity. If the AI's terminal goal were changed to something other than paperclips, fewer paperclips would get made. So the current goal gives the AI a reason to resist goal modification.
Cognitive enhancement. Better reasoning and planning abilities would help the AI make paperclips more effectively.
Technological perfection. Better technology will improve the efficiency and effectiveness of producing paperclips.
Resource acquisition. Money, raw materials, energy, computing power, and social influence can all be used to produce more paperclips.

These instrumental goals aren't specific to paperclips. An AI trying to cure cancer or prove mathematical theorems would have similar reasons to preserve itself, maintain its goals, improve its cognition, advance its technology, and acquire resources. The instrumental goals are shared because facts about our world — that agents can be destroyed, that goals can be altered, that resources are useful and limited — apply regardless of what terminal goal an agent has.

This reasoning applies even to terminal goals that sound limited in scope. More resources, more power, and fewer threats all increase the probability of success, whatever "success" means for that agent.

Instrumental convergence means that a goal-directed AI would by default want to seize control of the world. It’s a major part of what makes superintelligence dangerous.

Steve Omohundro, "The Basic AI Drives" (2008). ↩︎

Why can't we just turn the AI off if it starts to misbehave?

What is an agent?

What is the orthogonality thesis?