What is Anthropic's approach to LLM alignment?

Anthropic fine tuned a language model to be more helpful, honest and harmless: HHH.

Motivation: The point of this is to:

  1. see if we can "align" a current day LLM, and

  2. raise awareness about safety in the broader ML community.