Anchor Invariance: The Next Step in LLM Safety
Anchor Invariance Regularization (AIR) is tackling the challenge of aligning large language models (LLMs) with human intent by improving context invariance, making them less susceptible to adversarial manipulation.