What is Anthropic's approach to LLM alignment?
Anthropic fine tuned a language model to be more helpful, honest and harmless: HHH.
Motivation: The point of this is to:
-
see if we can "align" a current day LLM, and
-
raise awareness about safety in the broader ML community.