What are some good books about AI safety?

The AI Does Not Hate You (2020) is an entertaining and accessible outline of the core ideas around AI existential risk, along with an exploration of the community and culture of AI safety researchers.

The Alignment Problem (2020) by Brian Christian is a comprehensive overview of challenges that come with aligning AI systems from a machine learning perspective.

The book that first made the case to the public is Nick Bostrom’s Superintelligence (2014). It gives an excellent overview of the state of the field as it was then and makes a strong case for why AI safety is important. However, it barely covers the now leading AI paradigm, deep learning, and thereby also doesn’t talk about newer developments such as language models. It is packed with detail and written in a way some people enjoy, but others find too dense.

There's also Human Compatible (2019) by Stuart Russell, which gives a more up-to-date review of developments, with an emphasis on the approaches that the Center for Human-Compatible AI is working on, such as cooperative inverse reinforcement learning. There's a good review/summary on SlateStarCodex, explaining how it manages to be impressively non-weird given the topic matter.

Various other books explore the issues such as Toby Ord’s The Precipice (2020), Max Tegmark’s Life 3.0 (2017), Yuval Noah Harari’s Homo Deus (2016), Stuart Armstrong’s Smarter Than Us (2014), and Luke Muehlhauser’s Facing the Intelligence Explosion (2013).