MultiShotMaster: A Controllable Multi-Shot Video Generation Framework Paper • 2512.03041 • Published Dec 2, 2025 • 64
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations Paper • 2509.09676 • Published Sep 11, 2025 • 35
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 190
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4, 2025 • 43
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Paper • 2504.01934 • Published Apr 2, 2025 • 22
Towards Physically Plausible Video Generation via VLM Planning Paper • 2503.23368 • Published Mar 30, 2025 • 40 • 3
Towards Physically Plausible Video Generation via VLM Planning Paper • 2503.23368 • Published Mar 30, 2025 • 40
AMD-Hummingbird: Towards an Efficient Text-to-Video Model Paper • 2503.18559 • Published Mar 24, 2025 • 5
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27, 2025 • 30
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Paper • 2502.08639 • Published Feb 12, 2025 • 43
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14