VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 6 days ago • 4
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 5 days ago • 13
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models Paper • 2601.15224 • Published 6 days ago • 12
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 11 days ago • 30
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 6 days ago • 87
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Paper • 2601.03044 • Published 21 days ago • 28
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 6 days ago • 42
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands Paper • 2512.24965 • Published 27 days ago • 41
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 23 days ago • 43
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published 28 days ago • 62
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 21 days ago • 99
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow Paper • 2512.24766 • Published 28 days ago • 9
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published Dec 26, 2025 • 28
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published Dec 17, 2025 • 63
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence Paper • 2512.10863 • Published Dec 11, 2025 • 22