Spaces:

Smikke
/

wan2-video-generation

Sleeping

File size: 4,971 Bytes

c0e1c6a
d16eb70
 
 
 
c0e1c6a
d16eb70
c0e1c6a
 
d16eb70
 
 
 
 
 
 
 
 
 
c0e1c6a
 
d16eb70

---
title: Wan2.2 Video Generation
emoji: 🎥
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - video-generation
  - text-to-video
  - image-to-video
  - diffusers
  - wan
  - ai-video
  - zero-gpu
python_version: "3.10"
---

# Wan2.2 Video Generation 🎥

Generate high-quality videos from text prompts or images using the powerful **Wan2.2-TI2V-5B** model!

This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology.

## Features ✨

- **Text-to-Video**: Generate videos from descriptive text prompts
- **Image-to-Video**: Animate your images by adding an input image
- **High Quality**: 720P resolution at 24fps
- **Customizable**: Adjust resolution, number of frames, guidance scale, and more
- **Reproducible**: Use seeds to recreate your favorite generations

## Model Information 🤖

**Wan2.2-TI2V-5B** is a unified text-to-video and image-to-video generation model with:

- **5 billion parameters** optimized for consumer-grade GPUs
- **720P resolution** support (1280x704 default)
- **24 fps** smooth video output
- **Optimized duration**: Default 3 seconds (optimized for Zero GPU limits)

The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models.

## How to Use 🚀

### Text-to-Video Generation

1. Enter your prompt describing the video you want to create
2. Adjust settings in "Advanced Settings" if desired
3. Click "Generate Video"
4. Wait for generation (typically 2-3 minutes on Zero GPU with default settings)

### Image-to-Video Generation

1. Upload an input image
2. Enter a prompt describing how the image should animate
3. Click "Generate Video"
4. The output will maintain the aspect ratio of your input image
5. Generation takes 2-3 minutes with optimized settings

## Advanced Settings ⚙️

- **Width/Height**: Video resolution (default: 1280x704)
- **Number of Frames**: Longer videos need more frames (default: 73 frames ≈ 3 seconds, max: 145)
- **Inference Steps**: More steps = better quality but slower (default: 35, optimized for speed)
- **Guidance Scale**: How closely to follow the prompt (default: 5.0)
- **Seed**: Set a specific seed for reproducible results

**Note**: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users.

## Tips for Best Results 💡

1. **Detailed Prompts**: Be specific about what you want to see
   - Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting"
   - Basic: "cats fighting"

2. **Image-to-Video**: Use clear, high-quality input images that match your prompt

3. **Quality vs Speed** (optimized for Zero GPU limits):
   - Fast: 25-30 steps (~2 minutes)
   - Balanced: 35 steps (default, ~2-3 minutes)
   - Higher Quality: 40-50 steps (~3+ minutes, may timeout)

4. **Experiment**: Try different guidance scales:
   - Lower (3-4): More creative, less literal
   - Default (5): Good balance
   - Higher (7-10): Strictly follows prompt

## Example Prompts 📝

- "Two anthropomorphic cats in comfy boxing gear fight on stage"
- "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully"
- "A bustling futuristic city at night with neon lights and flying cars"
- "A peaceful mountain landscape with snow-capped peaks and a flowing river"
- "An astronaut riding a horse through a nebula in deep space"
- "A dragon flying over a medieval castle at sunset"

## Technical Details 🔧

- **Model**: Wan-AI/Wan2.2-TI2V-5B-Diffusers
- **Framework**: Hugging Face Diffusers
- **Backend**: PyTorch with bfloat16 precision
- **GPU**: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated)
- **GPU Duration**: 180 seconds (3 minutes) for Pro users
- **Generation Time**: ~2-3 minutes with optimized settings (73 frames, 35 steps)

## Limitations ⚠️

- Generation requires compute time (2-3 minutes with default settings)
- Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free)
- Videos longer than 6 seconds (145 frames) may timeout
- Higher quality settings (50+ steps) may timeout on Zero GPU
- Complex scenes with many objects may be challenging

## Credits 🙏

- **Model**: [Wan-AI](https://huggingface.co/Wan-AI)
- **Original Repository**: [Wan2.2](https://github.com/Wan-Video/Wan2.2)
- **Framework**: [Hugging Face Diffusers](https://github.com/huggingface/diffusers)

## License 📄

This Space uses the Wan2.2 model which is released under Apache 2.0 license.

## Related Links 🔗

- [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI)
- [Original Model Card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
- [Diffusers Documentation](https://huggingface.co/docs/diffusers)

---

**Note**: This is a community-created Space for easy access to Wan2.2 video generation. Generation times may vary based on current GPU availability.