Why would misaligned AI pose a threat that we can’t deal with?

2 min read

Suggest changes in Google Docs

Human civilization is pretty robust. New technologies, cultural changes, and malign actors have sometimes caused great harm. However, in many cases, we’ve been better able to adapt to their consequences than expected. Even the worst cases haven’t irreversibly ruined human civilization (though some have come close).

But a misaligned AI, if sufficiently powerful, would want to prevent us from interfering with its plans, in order to make the consequences of its actions permanent. Greater-than-human intelligence would allow them to improve themselves, invent new technologies at a much faster pace than we have, and out-strategize us by thinking and adapting their plans at humanly incomprehensible speeds.

Such a system’s intent and ability to take over — together unprecedented for any past technology — may prevent us from relying on a strategy of trial and error, where we gradually learn our lessons in dealing with weaker misaligned systems and apply those lessons to stronger misaligned systems. We don’t know how quickly systems will gain capabilities, whether solutions that work on weaker systems will generalize to stronger ones, or how to coordinate between AI creators to only experiment in small incremental steps. We may need to succeed on the first critical try — failure may mean AI taking over and making the catastrophe irreversible.

Proposed solutions to these issues include only deploying AI in limited contexts or relying on competition between different kinds of misaligned agents to produce a good outcome, but these solutions have substantial issues of their own.