Why can’t we just use Asimov’s Three Laws of Robotics?

2 min read

Suggest changes in Google Docs

In Isaac Asimov's science fiction, the Three Laws of Robotics are a set of rules that robots are programmed to follow. They are:

The First Law: A robot may not injure a human being or, through inaction, allow a human being to come to harm.
The Second Law: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
The Third Law: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Rules of this kind are sometimes proposed as a way that we could ensure that an AI behaves as we want it to. However, "ordinary language" rules like this don't capture everything about how we'd like AI to act. Concepts that seem intuitive to us, like “harm”, are actually quite nuanced, and only seem simple to us because we have a lot of context about how to understand them. Additionally, lists of rules like the Three Laws are almost certain to fail to cover atypical situations or edge cases that the rulemakers have not considered.

Indeed, Asimov's stories involving these laws primarily explore how the laws can go wrong. For example, “Liar!” considers a robot which lies to humans in order to avoid hurting their feelings, in order to obey the first law. However, it ends up in an impossible situation when it later learns that deceiving humans also harms them.

Can we constrain a goal-directed AI using specified rules?

Aren't there easy solutions to AI alignment?