Why can’t we just “put the AI in a box” so that it can’t influence the outside world?

2 min read

Suggest changes in Google Docs

One possible way to ensure the safety of a powerful AI system is to keep it contained in a secure software environment, which is referred to as “boxing the AI”. There is nothing intrinsically wrong with this procedure — keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.

Reliably boxing intelligent agents is hard, as illustrated by humans who escape from jail or control drug empires while incarcerated. In particular, writing secure software is hard. Even if "boxed" from interacting with the world through standard channels, a powerful AI system could potentially influence the world in exotic ways that we didn't expect, such as by learning how its hardware works and manipulating bits to send radio signals. It could fake a malfunction and attempt to manipulate the engineers who look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.