How can I do conceptual, mathematical, or philosophical work on AI alignment?

General advice

There isn’t a standard career path in this area. AI alignment is a pre-paradigmatic field in which nobody has a good idea what the right prerequisite knowledge is or what an answer looks like. That means this is a path for people who are willing to wrestle with uncertainty.

Financially supporting your research can be hard; funding isn’t a reliably solved problem, but opportunities for funding do exist.

Rather than thinking of your goal as trying to “become a researcher”, it might be better to think of it as trying to solve the alignment problem. You can get started by reading and thinking about the problem, maybe commenting on posts, and writing down your own ideas in private docs or on LessWrong. Don’t necessarily rely on getting feedback without actively reaching out to people who might have good thoughts. It will help you to find peers to be in contact with.

One way to get into conceptual work is by writing distillations of other people’s work, or critiquing key posts in places like LessWrong (which includes everything that has been posted on the Alignment Forum). It’s important to develop your own “inside view” on the problem.

Consider asking around your personal network for an alignment research mentor, or a collaborator who knows the literature and can give you pointers and feedback. This is unlikely to work with leading alignment researchers, who already get a lot of requests for mentorship, but may be more likely to succeed with people you locally know who can teach you generic research skills. It depends a lot on the person. If you can get a mentor, that’s great, but you don’t need one to succeed, so don’t get blocked on it: almost everything you can get from a mentor, you can also get from a mix of learning by doing and having discussions with and getting feedback from peers. It will take you a bit longer and you’ll probably hit a few more dead ends without a mentor to guide you, but you can do it.

Training programs

Consider training programs (e.g. SERI-MATS) and internships. AI Safety Training has an overview of these. AGI Safety Fundamentals runs courses on AI alignment and governance. The 80,000 Hours AI safety syllabus lists a lot of reading material. For more suggestions, look at Linda Linsefors’s collection of do-it-yourself training programs.

If you’re applying to a program, choose whichever one you think you will most enjoy. The important thing is to start learning the field and to get some contacts. You’ll end up learning different things in different programs, but you won’t be locked into that path. You’re free to continue exploring whatever direction you want and to apply to other programs in the future, and you’ll have a much easier time navigating the space when you have some context and some connections.

Guides and resources

Some helpful guides:

Other resources:



AISafety.info

AISafety.info is a project founded by Rob Miles. The website is maintained by a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.

© AISafety.info, 2022—1970

Aisafety.info is an Ashgro Inc Project. Ashgro Inc (EIN: 88-4232889) is a 501(c)(3) Public Charity incorporated in Delaware.