What is everyone working on in AI alignment?

12 min read

Suggest changes in Google Docs

This page summarizes the alignment research of specific organizations. (See this page for an overview of four major types of alignment research.)

Organizations

Independent research groups

Aligned AI (website) has published research on concept extrapolation, detecting distribution shifts, goal misgeneralization, preference change, and value learning, as well as work on AI governance and strategy. It was cofounded by Stuart Armstrong and Rebecca Gorman.
The Alignment Research Center (ARC) (website) does prosaic alignment research in areas including Eliciting Latent Knowledge (ELK), Iterated Amplification and Distillation (IDA), and developing a framework for “formal heuristic arguments” based on mechanistic interpretability and formal proof methods. It incubated (as ARC Evals) the now-separate organization METR (see below), which evaluates model capabilities. It was founded by Paul Christiano.
Apollo Research (website) is an organization focused on developing evaluations of deception and other potentially misaligned behavior in AI. It aims to create a “holistic and far-ranging model evaluation suite that includes behavioral tests, fine-tuning, and interpretability approaches.” Apollo also offers technical expertise in auditing and model evaluation to policymakers.
The Association for Long Term Existence and Resilience (ALTER) (website) is an academic research and advocacy organization focused on the long-term future. Its recent AI safety work includes a proposal for measuring “stubbornness” in AI agents and applying value alignment to the use of AI in legal settings.
The Center for AI Safety (CAIS) (website) is a non-profit that has done both technical and conceptual research. It runs a compute cluster specifically for ML safety research. CAIS has done literature reviews and research on robustness, anomaly detection, and machine ethics, and has developed several prominent benchmarks for evaluating AI safety and capabilities. CAIS also works on advocacy and field building for AI safety. It organized the May 2023 Statement on AI Risk.
The Center on Long-Term Risk (website) is a research group focused on avoiding s-risk scenarios in which AI agents deliberately cause great suffering. To this end, its research largely concerns game theory and decision theory, with a focus on "conflict scenarios as well as technical and philosophical aspects of cooperation."
Conjecture (website) was formed from EleutherAI in 2022. Its alignment agenda focuses on building Cognitive Emulations (CoEms) — AI systems that emulate human reasoning processes — with the intent that their reasoning will be more transparent while remaining competitive with frontier models. Conjecture has also done work on AI governance. Its CEO and founder is Connor Leahy.
Elicit (website; blog) is an automated research assistant tool. The team building it spun off of Ought (see below) in September 2023. The Elicit team aims to advance AI alignment by using AI to “scale up good reasoning” to arrive at “true beliefs and good decisions.” It aims to produce “process-based systems,” and toward this end has done research on factored cognition and task decomposition.
EleutherAI (website) is a non-profit research lab that started as a Discord server in 2020. EleutherAI has primarily worked on training LLMs, and has released some LLMs which were the state-of-the-art open source LLMs at the time of their release. It provides open access to these LLMs and their codebases. It also researches interpretability, corrigibility, and mesa-optimization. EleutherAI was created by Connor Leahy, Sid Black, and Leo Gao. Eleuther’s executive director is Stella Biderman.
Encultured AI (website) was, from 2022 to 2023, a “video game company focused on enabling the safe introduction of AI technologies into [Encultured’s] game world,” to provide a platform for AI safety and alignment solutions to be tested. In 2024, Encultured shifted towards healthcare applications of AI. Encultured AI was founded by Andrew Critch, Nick Hay, and Jaan Tallinn.
FAR AI (website), the Fund for Alignment Research, works to “incubate and accelerate research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry.” Its alignment research has included adversarial robustness, interpretability, and preference learning.
The Machine Intelligence Research Institute (MIRI) (website) is an organization focused on reducing the risk of human extinction from AI. As of June 2024, MIRI is pivoting from technical research on the alignment of superintelligent systems to advocacy and governance work. MIRI was the first organization explicitly focused on solving the AI alignment problem and influenced the development of the field of AI safety. MIRI's main research agendas were "Agent Foundations for Aligning Machine Intelligence with Human Interests" and "Alignment for Advanced Machine Learning Systems," which focus on highly-reliable agent design, value specification, and error tolerance. It was founded by Eliezer Yudkowsky.
METR (Model Evaluation and Threat Research) (website) is "a research nonprofit that works on assessing whether cutting-edge AI systems could pose catastrophic risks to society." Its work includes evaluating frontier models for autonomous capabilities, developing a standard set of tasks for evaluating AI capabilities, and consulting on responsible scaling policies. METR was founded by Beth Barnes.
Obelisk (website) is the AGI laboratory of Astera Institute. It focuses on computational neuroscience, and works on developing “brain-like AI” — AI inspired by the architecture of human brains. Its current work includes a computation model based on neuroscience, research furthering neuroscience itself, an evolutionary computation framework, and a training environment that scales in complexity. Obelisk was founded by Jed McCaleb.
Orthogonal (website) is a research organization focused on agent foundations. It works primarily on the “question-answer counterfactual interval” (QACI) alignment proposal.
Ought (website) is a product-driven research lab. Elicit, an organization building an AI research assistant, was incubated at Ought. While building Elicit, Ought was focused on factored cognition and supervising LLM processes (instead of outcomes). Ought’s mission is to “scale up good reasoning” so that machine learning “help[s] as much with thinking and reflection as it does with tasks that have clear short-term outcomes”.
Redwood Research (website) focuses on prosaic alignment techniques “motivated by theoretical arguments for how they might scale”, including work on AI control, interpretability, and “causal scrubbing”. Redwood has also run the machine learning bootcamp MLAB and the research program REMIX.
Timaeus (website) is an organization founded in 2023 to pursue “developmental interpretability.” This agenda uses principles from Singular Learning Theory (SLT) to detect and interpret "phase transitions" — i.e., points during the training and scaling of a machine learning model where it appears to qualitatively change the way it thinks.

Academic research groups

The Cambridge Computational and Biological Learning Lab's Machine Learning Group (website) is a lab located in the Department of Engineering at the University of Cambridge. Its alignment research includes work on reward hacking, goal misgeneralization, and interpretability. It is led by Adrian Weller and Hong Ge.
The Center for Human-Compatible AI (CHAI) (website) is a research group based at UC Berkeley. CHAI works on developing “provably beneficial AI systems,” emphasizing representing uncertainty in AI objectives and getting AIs to defer to human judgment. Its research includes work on corrigibility, preference inference, transparency, oversight, agent foundations, robustness, and more. CHAI was founded by Stuart Russell.
The Centre for the Study of Existential Risk (CSER) (website) at the University of Cambridge focuses on interdisciplinary research to mitigate existential threats, including those from biotechnology, climate change, global injustice, and AI. From 2018–2021, CSER's research focused on "generality" in AI, including the definition of generality, the relationship between generality and computing power, and the tradeoffs between generality and capability. CSER collaborates with the Leverhulme Centre for the Future of Intelligence on AI:FAR. CSER was co-founded by Martin Rees, Jaan Tallinn and Huw Price.
The MIT Algorithmic Alignment Group (website) is part of the Embodied Intelligence group at the MIT Computer Science and Artificial Intelligence Laboratory. Its alignment research spans many areas including interpretability, human-AI interaction, and multi-agent systems. The group is led by Dylan Hadfield-Menell
The NYU Alignment Research Group (website) is a research group, which overlaps and works with other groups at NYU, that does “empirical work with language models that aims to address longer-term concerns.” Its research agenda includes work on scalable oversight like debate, amplification, and recursive reward modeling; studying the behavior of language models;, and design of experimental protocols that test for alignment. The group is led by Sam Bowman.

AI Companies

Anthropic (website) was founded in 2021. It is known for developing the Claude family of large language models. Anthropic's portfolio of approaches to AI alignment includes Constitutional AI (which Anthropic developed) as well as reinforcement learning from human feedback (RLHF), interpretability, activation steering, and automated red-teaming. It is led by Dario and Daniela Amodei.
Google DeepMind (website) formed in 2023 from a merger of DeepMind and Google Brain. Its products include AlphaGo (which defeated top Go player Lee Sedol in 2016), AlphaFold (which predicts protein structures) and AlphaStar (which plays the video game StarCraft II). It also provides Gemini, an LLM-based chatbot/assistant. Google DeepMind's alignment research includes work on model evaluation, value learning, task decomposition, and robustness. Its CEO is Demis Hassabis.
OpenAI (website), founded in 2015, is probably best known for introducing the generative pre-trained transformer (GPT) architecture for LLMs and for the chatbot ChatGPT. It has also created DALL-E (a text-to-image generator), SORA (a text-to-video generator), and a number of other generative AI models in other domains. OpenAI's alignment work focuses on human feedback, scalable oversight, and automating alignment research. Its CEO is Sam Altman.

Defunct Organizations

The Future of Humanity Institute (FHI) (website) was an Oxford University research center, which is now defunct. It had five research groups, which included AI safety, AI governance, and “digital minds” alongside macrostrategy and biosecurity. Its AI safety work included work on identifying principles to guide AI behavior and detecting novel risks, alongside work on governance. FHI was directed by Nick Bostrom.