Byte-Sized
New AI Safety Benchmarks Reveal Progress and Gaps
Updated safety benchmarks show improved model behavior but highlight new challenges in agentic AI.
2025-11-18
The latest round of AI safety benchmarks reveals significant progress in reducing harmful outputs from language models, with leading models showing 95%+ compliance on standard safety tests. However, new challenges have emerged around agentic AI systems - models that can take actions in the real world. Researchers highlight risks around autonomous decision-making, cascading errors in multi-step workflows, and the difficulty of defining safety boundaries for AI agents.