What is the strong scaling hypothesis?

1 min read

Suggest changes in Google Docs

A companion to the scaling laws is the scaling hypothesis. Here is a description from gwern:

Scaling Hypothesis

The strong scaling hypothesis is that, once we find a scalable architecture like self-attention or convolutions, [...] we can simply train ever larger [neural networks] and ever more sophisticated behavior will emerge naturally as the easiest way to optimize for all the tasks & data. The scaling laws, if the above hypothesis holds, become highly relevant to safety insofar capability gains become conceptually easier to achieve: no need for clever designs to solve a given task, just throw more processing at it and it will eventually yield.

What are scaling laws?

What is the strong scaling hypothesis?

In progress