File size: 4,971 Bytes
c0e1c6a
d16eb70
 
 
 
c0e1c6a
d16eb70
c0e1c6a
 
d16eb70
 
 
 
 
 
 
 
 
 
c0e1c6a
 
d16eb70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: Wan2.2 Video Generation
emoji: πŸŽ₯
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - video-generation
  - text-to-video
  - image-to-video
  - diffusers
  - wan
  - ai-video
  - zero-gpu
python_version: "3.10"
---

# Wan2.2 Video Generation πŸŽ₯

Generate high-quality videos from text prompts or images using the powerful **Wan2.2-TI2V-5B** model!

This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology.

## Features ✨

- **Text-to-Video**: Generate videos from descriptive text prompts
- **Image-to-Video**: Animate your images by adding an input image
- **High Quality**: 720P resolution at 24fps
- **Customizable**: Adjust resolution, number of frames, guidance scale, and more
- **Reproducible**: Use seeds to recreate your favorite generations

## Model Information πŸ€–

**Wan2.2-TI2V-5B** is a unified text-to-video and image-to-video generation model with:

- **5 billion parameters** optimized for consumer-grade GPUs
- **720P resolution** support (1280x704 default)
- **24 fps** smooth video output
- **Optimized duration**: Default 3 seconds (optimized for Zero GPU limits)

The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models.

## How to Use πŸš€

### Text-to-Video Generation

1. Enter your prompt describing the video you want to create
2. Adjust settings in "Advanced Settings" if desired
3. Click "Generate Video"
4. Wait for generation (typically 2-3 minutes on Zero GPU with default settings)

### Image-to-Video Generation

1. Upload an input image
2. Enter a prompt describing how the image should animate
3. Click "Generate Video"
4. The output will maintain the aspect ratio of your input image
5. Generation takes 2-3 minutes with optimized settings

## Advanced Settings βš™οΈ

- **Width/Height**: Video resolution (default: 1280x704)
- **Number of Frames**: Longer videos need more frames (default: 73 frames β‰ˆ 3 seconds, max: 145)
- **Inference Steps**: More steps = better quality but slower (default: 35, optimized for speed)
- **Guidance Scale**: How closely to follow the prompt (default: 5.0)
- **Seed**: Set a specific seed for reproducible results

**Note**: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users.

## Tips for Best Results πŸ’‘

1. **Detailed Prompts**: Be specific about what you want to see
   - Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting"
   - Basic: "cats fighting"

2. **Image-to-Video**: Use clear, high-quality input images that match your prompt

3. **Quality vs Speed** (optimized for Zero GPU limits):
   - Fast: 25-30 steps (~2 minutes)
   - Balanced: 35 steps (default, ~2-3 minutes)
   - Higher Quality: 40-50 steps (~3+ minutes, may timeout)

4. **Experiment**: Try different guidance scales:
   - Lower (3-4): More creative, less literal
   - Default (5): Good balance
   - Higher (7-10): Strictly follows prompt

## Example Prompts πŸ“

- "Two anthropomorphic cats in comfy boxing gear fight on stage"
- "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully"
- "A bustling futuristic city at night with neon lights and flying cars"
- "A peaceful mountain landscape with snow-capped peaks and a flowing river"
- "An astronaut riding a horse through a nebula in deep space"
- "A dragon flying over a medieval castle at sunset"

## Technical Details πŸ”§

- **Model**: Wan-AI/Wan2.2-TI2V-5B-Diffusers
- **Framework**: Hugging Face Diffusers
- **Backend**: PyTorch with bfloat16 precision
- **GPU**: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated)
- **GPU Duration**: 180 seconds (3 minutes) for Pro users
- **Generation Time**: ~2-3 minutes with optimized settings (73 frames, 35 steps)

## Limitations ⚠️

- Generation requires compute time (2-3 minutes with default settings)
- Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free)
- Videos longer than 6 seconds (145 frames) may timeout
- Higher quality settings (50+ steps) may timeout on Zero GPU
- Complex scenes with many objects may be challenging

## Credits πŸ™

- **Model**: [Wan-AI](https://huggingface.co/Wan-AI)
- **Original Repository**: [Wan2.2](https://github.com/Wan-Video/Wan2.2)
- **Framework**: [Hugging Face Diffusers](https://github.com/huggingface/diffusers)

## License πŸ“„

This Space uses the Wan2.2 model which is released under Apache 2.0 license.

## Related Links πŸ”—

- [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI)
- [Original Model Card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
- [Diffusers Documentation](https://huggingface.co/docs/diffusers)

---

**Note**: This is a community-created Space for easy access to Wan2.2 video generation. Generation times may vary based on current GPU availability.