Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published 13 days ago • 110
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 12 days ago • 116
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control Paper • 2601.05138 • Published Jan 8 • 18
Urban Socio-Semantic Segmentation with Vision-Language Reasoning Paper • 2601.10477 • Published 25 days ago • 155
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 166
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning Paper • 2512.24146 • Published Dec 30, 2025 • 14
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published May 12, 2025 • 30
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 99