What are selection theorems?

3 min read

Suggest changes in Google Docs

Selection¹ theorems tell us what type of agent will be selected for in a wide variety of environments.

One example of a selection theorem is the Kelly criterion. It tells us that an agent which bets with the goal of maximizing the expected logarithm of their wealth will be most successful in the long run, and will be selected for in a market. Other agents’ wealth will grow more slowly or they will eventually go bankrupt.

There are other known selection theorems. But none of them seem to exactly match human behaviour since they make various assumptions which aren’t true of people in the real world. John Wentworth's research program aims to find other, more useful selection theorems.

He argues that selection theorems can be helpful for alignment for three reasons:

They can identify structures and constraints of the human mind, helping identify human values.
They can help constrain our design of AGI by telling us which structures are viable.
They can tell us properties of agents that we might accidentally select for, thus helping solve inner misalignment.

To be a little bit more technical, selection theorems have two parts: the selection metric, and the kind of agents we get based on application of that metric.

In the Kelly criterion mentioned earlier, the metric is long term wealth. Other selection metrics include: survival, inclusive genetic fitness, financial profitability and inexploitability.

The second part of these theorems concern the “kind” of agents, which are determined by their type signatures. A selection theorem seeks to answer questions such as:

What kind of type signature does the agent have? (What are the inputs? What are its outputs?)
How is it represented/what data structures represent it?
How is it embedded in an underlying physical (or other low level) system?

Selection theorems are relevant to evolutionary biology, economics and other fields. They are descriptive theories about optimized systems as they actually are in the real world (including simple organisms such as E. coli, complex organisms like human beings, and complex artifacts such as trained neural networks and financial markets) and aim to tackle core confusions and gaps in our understanding of alignment and agency. They are not a normative theory about what an ideal agent would be like.

This is called selection in a similar sense as natural selection in evolution. It is important to note that there is no person making these selections, rather we are talking about a process which happens on its own. ↩︎

What are selection theorems?

In progress