What is AI safety?

2 min read

Suggest changes in Google Docs

"AI safety" means preventing artificial intelligence from causing harm.¹ This site focuses specifically on the existential risk to humanity that powerful future AI systems could pose.

Some kinds of AI safety work are:

AI alignment: Doing technical and conceptual research focused on getting AI systems to do what we want them to do.
AI policy and governance: Setting up institutions and mechanisms that cause major actors (such as AI companies and national governments) to implement good AI safety practices.
AI strategy and forecasting: Building models of how AI will develop and which actions can make it go better.

Efforts to build and maintain systems and resources to support the above are sometimes also included in AI safety. Such efforts include outreach, community building, and education.

The terms “AI safety” and “AI risk” are mostly used in the context of existential risk. "AI safety" is sometimes also used more broadly to include work on reducing harms from current AI systems. While people sometimes use “AI safety” and “AI alignment” interchangeably to refer to the general set of problems around smarter-than-human AI, they occasionally use “AI existential safety” to make it clear that they mean risks to all of human civilization, or “AGI safety” to make it clear that they mean risks from future generally intelligent systems. ↩︎

What are the differences between AI safety, AI alignment, AI control, Friendly AI, AI ethics, AI existential safety, and AGI safety?

Is the worry that AI will become malevolent or conscious?

What are the main sources of AI existential risk?