Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 6 days ago • 77
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation Paper • 2512.16913 • Published 6 days ago • 31
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published 24 days ago • 88
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published 29 days ago • 31
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published Nov 22 • 37
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models Paper • 2510.12784 • Published Oct 14 • 19
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 118
VideoNSA: Native Sparse Attention Scales Video Understanding Paper • 2510.02295 • Published Oct 2 • 9