What are the main sources of AI existential risk?
There are several broad dynamics that seem like plausible contributors to the risk of an AI-caused existential catastrophe.
There are a number of ways that AI could end up behaving dangerously:
-
Training could produce a misaligned mesa-optimizer.
-
We could accidentally misspecify our goals.
-
AIs could be misused.
Additionally, there are features of the world that could make avoiding a disaster harder:
-
Insufficient time to solve open technical problems, especially AI alignment.
-
A lack of coordination between the most important actors, like AI labs and national governments.
-
The acceleration of progress through cheaper computing hardware, algorithmic progress and increased investment
One could also look at different kinds of dangerous uses AI could be put to, like locking in undesirable values or inventing powerful weapons. Different types of errors could persist in an AI even as its capabilities became highly advanced, like incorrect assumptions about metaethics, decision theory, or metaphilosophy.
A post-AGI world could end up with different broad patterns where human values lose influence, like new competitive pressures or concentration of power.