Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published 20 days ago • 10
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers Paper • 2512.17351 • Published 20 days ago • 25
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 58
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6, 2025 • 73
Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design Paper • 2311.00462 • Published Nov 1, 2023 • 1
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30, 2025 • 97
Symmetry-Aware Robot Design with Structured Subgroups Paper • 2306.00036 • Published May 31, 2023 • 2
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published Feb 13, 2025 • 35
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models Paper • 2411.00836 • Published Oct 29, 2024 • 15