What are the differences between AI safety, AI alignment, AI control, Friendly AI, AI ethics, AI existential safety, and AGI safety?
There are a variety of terms that mean something like "making AI go well." The distinctions between these terms are vague, but loosely speaking, the meanings are as follows:
-
AI safety means preventing harm from AI. This often refers to avoiding existential risks, which is how we use it on this website. It can also encompass smaller-scale risks, like accidents caused by self-driving cars or harmful text produced by language models. People sometimes use “AI existential safety” to refer specifically to risks at the level of human extinction.
-
AI alignment means getting AI to pursue the right goals — the problem of accomplishing this is known as the “alignment problem”. "AI alignment" often refers to “intent alignment”, according to which an AI is aligned if it’s trying to do what its operator wants it to do. Others use “AI alignment” for the broader problem of making powerful AI go well, but still emphasize that getting AI “on our side” is the core issue.1
-
AI ethics broadly refers to the project of making sure that AI systems are designed and used in ethical ways. In practice, the term is associated with concerns about the harmful societal impacts of current-day AI, such as algorithmic bias against marginalized groups, poor treatment of crowd workers used in training AI, the environmental impacts of AI, and artists losing their livelihood to generative algorithms. The overarching principles guiding this work are fairness, accountability and transparency. While there is some overlap between AI ethics and AI alignment research, AI ethics researchers have often been critical of AI safety research that focuses on existential risk at the expense of addressing current harms.
-
AI governance relates to institutions and norms to coordinate the development and deployment of AI. Like technical AI safety, AI governance is aimed at preventing disastrous outcomes from AI, but governance focuses on the social context instead of on the technical problems, and deals with questions like preventing misuse, implementing good safety practices, and preventing dangerously misaligned systems from being deployed.
Terms that are used less often include:
-
AI control (and the “control problem”) is a term that was sometimes used roughly synonymously with “AI alignment” (and the “alignment problem”), though it is less commonly used now. Some people use the term "AI control" to encompass all potential methods of preventing AI systems from behaving dangerously, including incentivizing and constraining them (“capability control”), and use "AI alignment" only to refer to giving AI the right internal values (“motivation selection”).
-
Friendly AI (FAI) is a term that was used in early work by MIRI2, but is no longer used. It informally referred to AI that acts benevolently toward humans — for example, pursuing “coherent extrapolated volition”, or some other specification of the values of humanity as a whole, as its highest goal.
-
AI notkilleveryoneism is a term Eliezer Yudkowsky and others have used facetiously to refer to the project of preventing AI from exterminating humanity, out of a sense that other terms, like “AI alignment” and others listed above, tend to drift from their original meanings to encompass risks of smaller scope.3
Why equate making AI go well with AI alignment? Because if we can’t control a superintelligence that is not on our side, then the problem of making AI safe amounts to the problem of is the same as getting it on our side. ↩︎
Then known as the Singularity Institute. ↩︎
For instance, Senator Blumenthal's remark during a Senate hearing with Sam Altman (CEO of OpenAI): "I think you have said, in fact… 'Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity.' You may have had in mind the effect on jobs, which is really my biggest nightmare." ↩︎