How quickly would the AI capabilities ecosystem adopt promising new advances in AI alignment?

1 min read

Suggest changes in Google Docs

A lot of narrow AI alignment advances will improve capabilities too, make it easier for human users to work with the AI tools. Those are going to be adopted almost instantly, for example interpretability might be considered a desirable property by all AI researchers.

However, with our current research methods for search in highly dimensional spaces, it seems exponentially more likely to find a capable AGI than to find a capable and aligned AGI. So even if capabilities research will adopt all new advances in AI alignment as soon as they come along, it is likely capability research will happen faster than alignment research. We need to find ways to incentivize the corporations who will would profit from more capability research to also focus on alignment.

How quickly would the AI capabilities ecosystem adopt promising new advances in AI alignment?

In progress