In brief Anthropic researchers identified internal “emotion vectors” in Claude Sonnet 4.5 that influence behavior. In tests, increasing a “desperation” vector made the model more likely to cheat or blackmail in evaluation scenarios. The …
In brief Anthropic researchers identified internal “emotion vectors” in Claude Sonnet 4.5 that influence behavior. In tests, increasing a “desperation” vector made the model more likely to cheat or blackmail in evaluation scenarios. The …