What is DeepMind's safety team working on?
DeepMind has both a machine learning safety team focused on near-term risks, and an alignment team working on risks from artificial general intelligence. The alignment team is pursuing many different research agendas.
Their work includes:
-
Engaging with recent arguments from the Machine Intelligence Research Institute
-
The alignment newsletter and podcast, which were produced by Rohin Shah.
-
Research like the Goal Misgeneralization paper.
-
Geoffrey Irving’s work on debate as an alignment strategy.
-
“Discovering Agents”, which introduces a causal definition of agents, then introduces an algorithm for finding agents from empirical data.
See Shah's comment for more research that they are doing, including a description of some that is currently unpublished.