How might things go wrong even without an agentic AI?

2 min read

Suggest changes in Google Docs

Failures can happen with narrow non-agentic systems, mostly from humans not anticipating safety-relevant decisions made too quickly to react, much like in the 2010 flash crash.

A helpful metaphor draws on self-driving cars. By relying more and more on an automated process to make decisions, people become worse drivers as they’re not training themselves to react to the unexpected; then the unexpected happens, the software system itself reacts in an unsafe way, and the human is too slow to regain control.

This generalizes to broader tasks. A human using a powerful system to make better decisions (say, as the CEO of a company) might not understand those very well, get trapped into an equilibrium without realizing it and essentially losing control over the entire process.

More detailed examples in this vein are described by Paul Christiano in “What failure looks like”.

Another source of failures is AI-mediated stable totalitarianism. The limiting factor in current pervasive surveillance, police and armed forces is manpower; the use of drones and other automated tools decreases the need for personnel to ensure security and extract resources.

As capabilities improve, political dissent could become impossible, checks and balances would break down as a minimal number of key actors is needed to stay in power

How can progress in non-agentic LLMs lead to capable AI agents?

What is an agent?