What are the main sources of AI existential risk?

2 min read

Suggest changes in Google Docs

There are several broad dynamics that seem like plausible contributors to the risk of an AI-caused existential catastrophe.

There are a number of ways that AI could end up behaving dangerously:

Training processes could produce unintended outcomes, for example a “misaligned mesa-optimizer.”
We could misspecify our goals, producing an AI that pursues goals we don't want.
AIs could be misused by people intentionally trying to cause harm.

Additionally, there are features of the world that could make avoiding a disaster harder:

Insufficient time to solve open technical problems, especially around AI alignment.
A lack of coordination between the most important actors, like AI labs and national governments.
The acceleration of progress through cheaper computing hardware, algorithmic progress and increased investment.

One could also look at different kinds of dangerous uses AI could be put to, like locking in undesirable values or inventing powerful weapons. Different types of errors could persist in an AI even as its capabilities became highly advanced, like incorrect assumptions about metaethics, decision theory, or metaphilosophy.

A post-AGI world could end up with different broad patterns where human values lose influence, like new competitive pressures or concentration of power.

What are existential risks (x-risks)?

What are accident and misuse risks?

How likely is extinction from superintelligent AI?