--- title: Wan2.2 Video Generation emoji: 🎥 colorFrom: purple colorTo: pink sdk: gradio sdk_version: 5.49.0 app_file: app.py pinned: false license: apache-2.0 tags: - video-generation - text-to-video - image-to-video - diffusers - wan - ai-video - zero-gpu python_version: "3.10" --- # Wan2.2 Video Generation 🎥 Generate high-quality videos from text prompts or images using the powerful **Wan2.2-TI2V-5B** model! This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology. ## Features ✨ - **Text-to-Video**: Generate videos from descriptive text prompts - **Image-to-Video**: Animate your images by adding an input image - **High Quality**: 720P resolution at 24fps - **Customizable**: Adjust resolution, number of frames, guidance scale, and more - **Reproducible**: Use seeds to recreate your favorite generations ## Model Information 🤖 **Wan2.2-TI2V-5B** is a unified text-to-video and image-to-video generation model with: - **5 billion parameters** optimized for consumer-grade GPUs - **720P resolution** support (1280x704 default) - **24 fps** smooth video output - **Optimized duration**: Default 3 seconds (optimized for Zero GPU limits) The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models. ## How to Use 🚀 ### Text-to-Video Generation 1. Enter your prompt describing the video you want to create 2. Adjust settings in "Advanced Settings" if desired 3. Click "Generate Video" 4. Wait for generation (typically 2-3 minutes on Zero GPU with default settings) ### Image-to-Video Generation 1. Upload an input image 2. Enter a prompt describing how the image should animate 3. Click "Generate Video" 4. The output will maintain the aspect ratio of your input image 5. Generation takes 2-3 minutes with optimized settings ## Advanced Settings ⚙️ - **Width/Height**: Video resolution (default: 1280x704) - **Number of Frames**: Longer videos need more frames (default: 73 frames ≈ 3 seconds, max: 145) - **Inference Steps**: More steps = better quality but slower (default: 35, optimized for speed) - **Guidance Scale**: How closely to follow the prompt (default: 5.0) - **Seed**: Set a specific seed for reproducible results **Note**: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users. ## Tips for Best Results 💡 1. **Detailed Prompts**: Be specific about what you want to see - Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting" - Basic: "cats fighting" 2. **Image-to-Video**: Use clear, high-quality input images that match your prompt 3. **Quality vs Speed** (optimized for Zero GPU limits): - Fast: 25-30 steps (~2 minutes) - Balanced: 35 steps (default, ~2-3 minutes) - Higher Quality: 40-50 steps (~3+ minutes, may timeout) 4. **Experiment**: Try different guidance scales: - Lower (3-4): More creative, less literal - Default (5): Good balance - Higher (7-10): Strictly follows prompt ## Example Prompts 📝 - "Two anthropomorphic cats in comfy boxing gear fight on stage" - "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully" - "A bustling futuristic city at night with neon lights and flying cars" - "A peaceful mountain landscape with snow-capped peaks and a flowing river" - "An astronaut riding a horse through a nebula in deep space" - "A dragon flying over a medieval castle at sunset" ## Technical Details 🔧 - **Model**: Wan-AI/Wan2.2-TI2V-5B-Diffusers - **Framework**: Hugging Face Diffusers - **Backend**: PyTorch with bfloat16 precision - **GPU**: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated) - **GPU Duration**: 180 seconds (3 minutes) for Pro users - **Generation Time**: ~2-3 minutes with optimized settings (73 frames, 35 steps) ## Limitations ⚠️ - Generation requires compute time (2-3 minutes with default settings) - Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free) - Videos longer than 6 seconds (145 frames) may timeout - Higher quality settings (50+ steps) may timeout on Zero GPU - Complex scenes with many objects may be challenging ## Credits 🙏 - **Model**: [Wan-AI](https://huggingface.co/Wan-AI) - **Original Repository**: [Wan2.2](https://github.com/Wan-Video/Wan2.2) - **Framework**: [Hugging Face Diffusers](https://github.com/huggingface/diffusers) ## License 📄 This Space uses the Wan2.2 model which is released under Apache 2.0 license. ## Related Links 🔗 - [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI) - [Original Model Card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers) - [Diffusers Documentation](https://huggingface.co/docs/diffusers) --- **Note**: This is a community-created Space for easy access to Wan2.2 video generation. Generation times may vary based on current GPU availability.