What approaches are AI alignment organizations working on?
Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s hard to be confident which (if any) of them will actually work.
The following is a massive oversimplification. Each organization actually pursues many different avenues of research. Read the 2021 AI Alignment Literature Review and Charity Comparison for more detail. That being said:
-
The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to assure that a seed AI will do good things if activated.
-
The Center for Human-Compatible AI focuses on cooperative inverse reinforcement learning and assistance games, a new paradigm for AI in which systems try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
-
Paul Christiano's Alignment Research Center focuses on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
-
The Future of Humanity Institute works on crucial considerations and other x-risks, as well as AI safety research and outreach.
-
Anthropic explores natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
-
OpenAI is in a state of flux after major changes to their safety team.
-
DeepMind’s safety team works on various approaches concerning modern machine learning, and communicates via the Alignment Newsletter.
-
EleutherAI is a machine learning collective aiming to build large open source language models to allow more alignment research to take place.
-
Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.
-
Conjecture is an alignment startup that aims to scale alignment research, including new frames for reasoning about large language models, scalable mechanistic interpretability, and history and philosophy of alignment.