To understand the risks posed by AI, it helps to understand AI itself – both how it currently works and how it might work in the future. This involves the details of its design and training, as well as research into the interpretability of the resulting systems.
On a more abstract level, it helps to understand the dynamics that could create existential risks if AI is scaled up to intelligence far above the human level, Various concepts have been defined to help us think about such risks.
We want to act strategically requires models of how the future of advanced AI will play out. This involves answering questions like when we will get advanced AI, how fast the transition to a superintelligence will be, and what a superintelligence would be capable of.
A key part of mitigating AI risk is aligning AI with human intentions. There are a range of approaches to this: for example, methods like adversarial training and learning from human feedback have made current AI systems more likely to produce the kinds of outputs intended by their designers. However, these methods have weaknesses, and the safety methods used on existing models may not generalize to future contexts. There are various proposals for scaling safety techniques as capabilities increase, as well as attempts to investigate the alignment problem at a more fundamental level.
Where technical alignment is about the design of AI systems, AI governance is about human decision-making around those systems. It focuses on designing 1) policies for safe AI development and deployment, for both current and future AI systems, and 2) incentive and enforcement mechanisms to ensure that relevant actors, such as governments and AI developers, follow those policies.