Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 10 days ago • 28
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 4 days ago • 50
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model Paper • 2601.15892 • Published 4 days ago • 47
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 5 days ago • 63
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 4 days ago • 84
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing Paper • 2601.04575 • Published 19 days ago • 8
Implicit Neural Representation Facilitates Unified Universal Vision Encoding Paper • 2601.14256 • Published 6 days ago • 5
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning Paper • 2601.11141 • Published 10 days ago • 19
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics Paper • 2601.14027 • Published 6 days ago • 11
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning Paper • 2601.14750 • Published 5 days ago • 16
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 5 days ago • 42
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 8 days ago • 47
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation Paper • 2601.13976 • Published 6 days ago • 21