view article Article Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm nvidia • Jun 11, 2025 • 132
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 45 items • Updated Mar 2 • 109
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 221
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 danaaubakirova, andito, merve, ariG23498, fracapuano, loubnabnl, pcuenq, mshukor, cadene • Jun 3, 2025 • 346
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14, 2025 • 309
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16, 2025 • 68
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10, 2025 • 88
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25, 2025 • 74
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published Feb 27, 2025 • 45
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published Mar 19, 2025 • 62
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3, 2025 • 90
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published Mar 7, 2025 • 57
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7, 2025 • 124
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
view article Article We now support VLMs in smolagents! +1 m-ric, merve, albertvillanova • Jan 24, 2025 • 113
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published Dec 18, 2024 • 25