Can you give an AI a goal which involves “minimally impacting the world”?
Penalizing an AI for affecting the world too much is called impact regularization and is an active area of alignment research.
Penalizing an AI for affecting the world too much is called impact regularization and is an active area of alignment research.