AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Paper • 2601.11044 • Published Jan 16 • 34
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 65
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published Dec 11, 2025 • 47
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 105
MathArena Benchmark Collection Competitions that are in the MathArena benchmark and on the website. • 23 items • Updated 6 days ago • 2
MegaMath: Pushing the Limits of Open Math Corpora Paper • 2504.02807 • Published Apr 3, 2025 • 35
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3, 2025 • 32
Running 3.72k The Ultra-Scale Playbook 🌌 3.72k The ultimate guide to training LLM on large GPU Clusters