VIDEOP2R: Video Understanding from Perception to Reasoning Paper β’ 2511.11113 β’ Published Nov 14, 2025 β’ 111 β’ 5
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper β’ 2510.23607 β’ Published Oct 27, 2025 β’ 179 β’ 4
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality Paper β’ 2505.18227 β’ Published May 23, 2025 β’ 15 β’ 3
DeepCritic: Deliberate Critique with Large Language Models Paper β’ 2505.00662 β’ Published May 1, 2025 β’ 54 β’ 8