Byte-Sized

New AI Safety Benchmarks Reveal Progress and Gaps

Updated safety benchmarks show improved model behavior but highlight new challenges in agentic AI.

2025-11-18

The latest round of AI safety benchmarks reveals significant progress in reducing harmful outputs from language models, with leading models showing 95%+ compliance on standard safety tests. However, new challenges have emerged around agentic AI systems - models that can take actions in the real world. Researchers highlight risks around autonomous decision-making, cascading errors in multi-step workflows, and the difficulty of defining safety boundaries for AI agents.

New AI Safety Benchmarks Reveal Progress and GapsNew AI Safety Benchmarks Reveal Progress and Gaps

New AI Safety Benchmarks Reveal Progress and Gaps