MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published 3 days ago • 144
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
The Hydra Effect: Emergent Self-repair in Language Model Computations Paper • 2307.15771 • Published Jul 28, 2023 • 19
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla Paper • 2307.09458 • Published Jul 18, 2023 • 11