Spaces:
Running
Running
| title: Wan2.2 Video Generation | |
| emoji: π₯ | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 5.49.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - video-generation | |
| - text-to-video | |
| - image-to-video | |
| - diffusers | |
| - wan | |
| - ai-video | |
| - zero-gpu | |
| python_version: "3.10" | |
| # Wan2.2 Video Generation π₯ | |
| Generate high-quality videos from text prompts or images using the powerful **Wan2.2-TI2V-5B** model! | |
| This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology. | |
| ## Features β¨ | |
| - **Text-to-Video**: Generate videos from descriptive text prompts | |
| - **Image-to-Video**: Animate your images by adding an input image | |
| - **High Quality**: 720P resolution at 24fps | |
| - **Customizable**: Adjust resolution, number of frames, guidance scale, and more | |
| - **Reproducible**: Use seeds to recreate your favorite generations | |
| ## Model Information π€ | |
| **Wan2.2-TI2V-5B** is a unified text-to-video and image-to-video generation model with: | |
| - **5 billion parameters** optimized for consumer-grade GPUs | |
| - **720P resolution** support (1280x704 default) | |
| - **24 fps** smooth video output | |
| - **Optimized duration**: Default 3 seconds (optimized for Zero GPU limits) | |
| The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models. | |
| ## How to Use π | |
| ### Text-to-Video Generation | |
| 1. Enter your prompt describing the video you want to create | |
| 2. Adjust settings in "Advanced Settings" if desired | |
| 3. Click "Generate Video" | |
| 4. Wait for generation (typically 2-3 minutes on Zero GPU with default settings) | |
| ### Image-to-Video Generation | |
| 1. Upload an input image | |
| 2. Enter a prompt describing how the image should animate | |
| 3. Click "Generate Video" | |
| 4. The output will maintain the aspect ratio of your input image | |
| 5. Generation takes 2-3 minutes with optimized settings | |
| ## Advanced Settings βοΈ | |
| - **Width/Height**: Video resolution (default: 1280x704) | |
| - **Number of Frames**: Longer videos need more frames (default: 73 frames β 3 seconds, max: 145) | |
| - **Inference Steps**: More steps = better quality but slower (default: 35, optimized for speed) | |
| - **Guidance Scale**: How closely to follow the prompt (default: 5.0) | |
| - **Seed**: Set a specific seed for reproducible results | |
| **Note**: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users. | |
| ## Tips for Best Results π‘ | |
| 1. **Detailed Prompts**: Be specific about what you want to see | |
| - Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting" | |
| - Basic: "cats fighting" | |
| 2. **Image-to-Video**: Use clear, high-quality input images that match your prompt | |
| 3. **Quality vs Speed** (optimized for Zero GPU limits): | |
| - Fast: 25-30 steps (~2 minutes) | |
| - Balanced: 35 steps (default, ~2-3 minutes) | |
| - Higher Quality: 40-50 steps (~3+ minutes, may timeout) | |
| 4. **Experiment**: Try different guidance scales: | |
| - Lower (3-4): More creative, less literal | |
| - Default (5): Good balance | |
| - Higher (7-10): Strictly follows prompt | |
| ## Example Prompts π | |
| - "Two anthropomorphic cats in comfy boxing gear fight on stage" | |
| - "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully" | |
| - "A bustling futuristic city at night with neon lights and flying cars" | |
| - "A peaceful mountain landscape with snow-capped peaks and a flowing river" | |
| - "An astronaut riding a horse through a nebula in deep space" | |
| - "A dragon flying over a medieval castle at sunset" | |
| ## Technical Details π§ | |
| - **Model**: Wan-AI/Wan2.2-TI2V-5B-Diffusers | |
| - **Framework**: Hugging Face Diffusers | |
| - **Backend**: PyTorch with bfloat16 precision | |
| - **GPU**: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated) | |
| - **GPU Duration**: 180 seconds (3 minutes) for Pro users | |
| - **Generation Time**: ~2-3 minutes with optimized settings (73 frames, 35 steps) | |
| ## Limitations β οΈ | |
| - Generation requires compute time (2-3 minutes with default settings) | |
| - Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free) | |
| - Videos longer than 6 seconds (145 frames) may timeout | |
| - Higher quality settings (50+ steps) may timeout on Zero GPU | |
| - Complex scenes with many objects may be challenging | |
| ## Credits π | |
| - **Model**: [Wan-AI](https://huggingface.co/Wan-AI) | |
| - **Original Repository**: [Wan2.2](https://github.com/Wan-Video/Wan2.2) | |
| - **Framework**: [Hugging Face Diffusers](https://github.com/huggingface/diffusers) | |
| ## License π | |
| This Space uses the Wan2.2 model which is released under Apache 2.0 license. | |
| ## Related Links π | |
| - [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI) | |
| - [Original Model Card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers) | |
| - [Diffusers Documentation](https://huggingface.co/docs/diffusers) | |
| --- | |
| **Note**: This is a community-created Space for easy access to Wan2.2 video generation. Generation times may vary based on current GPU availability. | |