LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 19 days ago • 134
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3, 2024 • 32
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing Paper • 2508.14033 • Published Aug 19, 2025 • 1
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 260
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8, 2025 • 185