Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published 4 days ago • 42
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning Paper • 2603.22057 • Published 5 days ago • 43
RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models Paper • 2603.21341 • Published 5 days ago • 23
CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design Paper • 2511.20737 • Published Nov 25, 2025 • 3
Vision-aligned Latent Reasoning for Multi-modal Large Language Model Paper • 2602.04476 • Published Feb 4 • 14