AI safety is a research field founded to avoid catastrophic outcomes from advanced AI, though the term has since expanded to include reducing less extreme harms from AI.
AI existential safety, or AGI safety is about reducing the existential risk from artificial general intelligence (AGI). Artificial general intelligence is AI that is at least as competent as humans in all skills that are relevant for making a difference in the world. AGI has not been developed yet, but will likely be developed in this century.
A central part of AGI safety is ensuring that what AIs do is actually what we want. This is called AI alignment (also often just called alignment), because it’s about aligning an AI with human values. Alignment is difficult, and building AGI is probably very dangerous, so it is important to mitigate the risks as much as possible. Examples for work on AI existential safety are
trying to get a foundational understandingwhat intelligence is, e.g.agent foundations
Outer and inner alignment: Ensure the objective of the training process is actually what we want, and also ensure the objective of the resulting system is actually what we want.
AI policy/strategy: e.g.researching the best way to set up institutions and mechanisms that help with safe AGI development,making sure AI isn’t used by bad actors
There are also areas of research which are useful for both near-term, and for existential safety. For example, robustness to distribution shift, and interpretability both help with making current systems safer, and are likely to help with AGI safety.
Near-term AI safetyis about preventing bad outcomes from currentsystems. Examples for work on near-term AI safety are
getting content recommender systems to not radicalize their users
ensuring autonomous cars don’t kill people
advocating strict regulations for lethal autonomous weapons
While the near-term AI safety is significant, this FAQ is focused on providing information which supports the goals of the former as it has the potential to be dramatically more important for humanity’s future.