What is a "warning shot"?

3 min read

Suggest changes in Google Docs

A warning shot in the context of AI safety is an AI-caused event that inspires greatly increased concern about AI-related existential risk, such as by causing extreme damage or global catastrophe (short of human extinction, which would leave nobody to be warned).¹ The term was first used in the AI safety literature in the article “What failure looks like” in 2019, by Paul Christiano.

For instance, a warning shot might be an unaligned AI system with human-level intelligence that attempts to take over a data center, but is stopped before it can do significant harm. Although the event itself won’t cause extinction, it could prompt governments and AI researchers to become more supportive of AI safety research and more concerned about the existential risks posed by AI.

One notable warning shot was the partial meltdown of the Three Mile Island nuclear reactor in 1979, which, for better or worse, marked a turning point in the American public’s perception of nuclear risks. This analysis of the incident points to lessons relevant to AI risk.

The COVID-19 pandemic could also be considered a warning shot about bio-risk. It exposed weaknesses in our global response to pandemics and highlighted the need for better coordination and investment in e.g. vaccine infrastructure.

In summary, a warning shot is a dangerous event that sparks concern about risks from advanced AI, and that may inspire strong measures to reduce those risks. It remains uncertain whether governments and other institutions would respond to warning shots effectively, or whether warning shots will happen at all before an existential disaster.

Some people use a broad definition of “warning shot,” according to which any AI-caused event that inspires concern is a warning shot, whereas others use a narrower definition, according to which an event is only a warning shot if it has disastrous consequences like mass death. If an AI system tried to break out of safeguards meant to contain it, but was detected before it could do any harm, this would be a warning shot in the broad sense but not the narrow sense. ↩︎