What is an expected utility maximizer?

An expected utility maximizer is a system–often called an “agent”–that acts in such a way as to maximize its expected utility, i.e., to get the most of what it wants. In other words, the agent takes actions that make certain (desired, preferred) results or states of the world more likely and other ones (undesired, less preferred) less likely.

Such an agent has beliefs about what consequences are likely to follow its actions taken in particular situations, as well as preferences over those consequences that satisfy certain requirements (called “coherence theorems”).

The expected utility (or value) of an action is an arithmetic mean of utilities of possible outcomes that may follow that action, weighted by each outcome’s probability.

An idealized expected utility maximizer would compute, for any particular situation, the expected utility of every action it may take and choose the one with the highest expected utility. In practice, this kind of computation is usually not feasible. Even very simple rules, such as the ones in Go, can give rise to games so complex that explicit computation of an action’s expected value is prohibitively costly. Therefore, we usually talk about agents acting close to an expected utility maximizer, most of the time relying on some heuristics (cognitive shortcuts) instead of computing expected utility for each action in every situation. Such approximate utility maximizers may resort to deliberate planning and something like explicit utility maximization when probabilities and utilities of consequences are known and computing them is expected to bring better results than relying on heuristics.

Examples of utility maximizers discussed in the field of AI safety include:

  • AIXI - A mathematical formalism describing a kind of utility maximizer.

  • Homo economicus - A naive view of humans as rational and narrowly self-interested agents. Originally, it was the default view on human decision-making in Western economics. Currently, it is seen as inadequate and sometimes brought up to contrast it with imperfect rationality of humans, e.g., lack of strategic thinking despite adequate intelligence.

  • Squiggle maximizer (paperclip maximizer) - A thought experiment originally meant to illustrate the view that a powerful artificial intelligence may end up with preferences that look absurd to us.

For contrast, here are some examples which are not utility maximizers:

  • Humans and other animals

  • Quantilizers - Systems that don’t look for the best solution, but are satisfied with choosing one of the better solutions from a specific collection of possibilities.

  • Most optimization algorithms aim at optimizing criteria that are not constrained by Natural selection is a process

  • Stochastic gradient descent) - Expected utility maximization Instead of picking an option with highest expected value, they choose among available adjacent options, based on whether it performs better than the current one on some relevant metric. For example, natural selection makes small modifications to the genome that improve its genetic fitness. Stochastic gradient descent modifies the neural network in order to lower the loss, based on the local gradient.