Which moral theories would be easiest to encode into an AI?

2 min read

Suggest changes in Google Docs

There are three major approaches to normative ethics (and some approaches to unify two or all of them): Virtue ethics, deontological ethics, and consequentialist ethics.

Virtue ethicists believe that at the core, leading an ethical life means cultivating virtues. In other words: What counts is less what one does moment-to-moment, but that one makes an effort to become the kind of person who habitually acts appropriately in all kinds of different situations. A prominent example for virtue ethics is stoicism.

Deontological ethicists believe that an ethical life is all about following certain behavioral rules, regardless of the consequences. Prominent examples include the ten commandments in Christianity, Kant's "categorical imperative" in philosophy, or Asimov's Three Laws of Robotics in science fiction.

Consequentialist ethicists believe that neither one's character nor the rules one lives by are what makes actions good or bad. Instead, consequentialists believe that only the consequences of an action count, both direct and indirect ones. A prominent example of consequentialist ethics is utilitarianism: The notion that those actions are the most moral that lead to the greatest good for the greatest number of individuals.

The short answer to the question which one of these might be the easiest to encode into an AI is: "We don't know." However, reinforcement learning agents optimize for consequences, not virtues or hard-coded rules. So if we are directly encoding a moral system, consequentialism may be the most relevant.

On the other hand, if AGI is developed through LLMs or other simulators, perhaps virtue ethics (i.e. simulating virtuous personalities) might be more relevant.

It’s worth noting that the ease with which we can encode these theories in an AI should not be the only criterion to choose which theory to use.