GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published 5 days ago • 29
UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation Paper • 2601.11522 • Published 9 days ago • 17
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning Paper • 2601.10129 • Published 11 days ago • 11
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following Paper • 2601.06431 • Published 16 days ago • 12
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published 10 days ago • 26
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 12 days ago • 140
Urban Socio-Semantic Segmentation with Vision-Language Reasoning Paper • 2601.10477 • Published 10 days ago • 154