LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published 20 days ago • 150
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 24 days ago • 25
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 26 days ago • 91
Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published Oct 10 • 16
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13 • 100
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness Paper • 2510.00536 • Published Oct 1 • 6
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30 • 43
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 103
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning Paper • 2509.17437 • Published Sep 22 • 17
view article Article RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Aug 11 • 28
Multimedia Generative Script Learning for Task Planning Paper • 2208.12306 • Published Aug 25, 2022 • 2
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Paper • 2312.10160 • Published Dec 15, 2023 • 2
PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation Paper • 2203.09100 • Published Mar 17, 2022 • 1
TempoSum: Evaluating the Temporal Generalization of Abstractive Summarization Paper • 2305.01951 • Published May 3, 2023 • 2
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 134
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 89
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11 • 101