What is "Safeguarded AI"?
Safeguarded AI (formerly the Open Agency Architecture) is a program led by David Dalrymple (“davidad”) at the UK’s Advanced Research and Invention Agency (ARIA) to develop AI systems with formal, quantitative safety guarantees
The core vision is to develop “gatekeeper” AI systems whose job it is to oversee autonomous AI agents and to prove that they’re operating within agreed-upon safety guardrails. This requires a new modeling stack that bridges low-level physics and high-level human concepts. Abstract norms (e.g., “don’t cause harm”) would need to be translated into concrete, testable physical statements so an agent’s actions can be evaluated formally for harmful consequences.
The project involves research in three main technical areas:
- Scaffolding: Developing tools and languages for specifying safety requirements and mathematical models of systems.
- Machine Learning: Creating AI systems that provably satisfy such safety specifications.
- Applications: Demonstrating high value in real-world settings.
The scale of the undertaking is massive — it requires solving fundamental problems in formal verification and AI alignment simultaneously. The goal is AI systems with safety guarantees as rigorous as those in established engineering disciplines.
In addition to the project’s uses in reducing existential risk, it would enable the use of frontier AI in applications where reliability is key, unlocking a great deal of value to businesses and governments. These applications could allow the project to be funded on a for-profit basis.
Safeguarded AI is similar to Yoshua Bengio’s Cautious Scientist AI, which has led to Bengio working with ARIA on the project. Both projects are attempts to build guaranteed safe AIs.