arxiv:2604.14148

Seedance 2.0: Advancing Video Generation for World Complexity

Published on Apr 15

· Submitted by

taesiri on Apr 16

#1 Paper of the day

ByteDance Seed

Upvote

Authors:

Liyang Chen ,

Abstract

Seedance 2.0 is a multi-modal audio-video generation model that supports text, image, audio, and video inputs with improved generation quality and speed.

AI-generated summary

Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrating one of the most comprehensive suites of multi-modal content reference and editing capabilities available in the industry to date. It delivers substantial, well-rounded improvements across all key sub-dimensions of video and audio generation. In both expert evaluations and public user tests, the model has demonstrated performance on par with the leading levels in the field. Seedance 2.0 supports direct generation of audio-video content with durations ranging from 4 to 15 seconds, with native output resolutions of 480p and 720p. For multi-modal inputs as reference, its current open platform supports up to 3 video clips, 9 images, and 3 audio clips. In addition, we provide Seedance 2.0 Fast version, an accelerated variant of Seedance 2.0 designed to boost generation speed for low-latency scenarios. Seedance 2.0 has delivered significant improvements to its foundational generation capabilities and multi-modal generation performance, bringing an enhanced creative experience for end users.

View arXiv page View PDF Add to collection

Community

wujie10

Paper author about 8 hours ago

Seedance 2.0 Model Card, Official Page: https://seed.bytedance.com/seedance2_0

natalie5

about 7 hours ago

The paper is actually more like an ad for their model, no details on training/data/infrastructure/inference/architecture, etc. Just benchmarks and links on how to use their model.

kevinlong

about 6 hours ago

Indeed, it might be to provide a space to showcase the author, facilitating personnel mobility.

grantsing

about 5 hours ago

nice breakdown of this one here if anyone wants the tldr https://arxivexplained.com/paper/seedance-20-advancing-video-generation-for-world-complexity the part about advancing video generation for world complexity was what grabbed me