End-to-end Music Remastering System Using Self-supervised and Adversarial Training Paper • 2202.08520 • Published Feb 17, 2022 • 2
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering Paper • 2508.03448 • Published Aug 5, 2025 • 6
DreamOmni2: Multimodal Instruction-based Editing and Generation Paper • 2510.06679 • Published Oct 8, 2025 • 73
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models Paper • 2509.17627 • Published Sep 22, 2025 • 66
MGM-Omni Collection MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech • 18 items • Updated Oct 11, 2025 • 11
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Paper • 2402.06894 • Published Feb 10, 2024 • 1
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published Mar 30, 2025 • 138
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Paper • 2502.13995 • Published Feb 19, 2025 • 9
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Paper • 2502.07531 • Published Feb 11, 2025 • 12
Stable Flow: Vital Layers for Training-Free Image Editing Paper • 2411.14430 • Published Nov 21, 2024 • 22
Zero-Shot Voice Cloning Collection TTS models that support zero-shot voice cloning • 8 items • Updated about 1 month ago • 14