Spaces:

Smikke
/

wan2-video-generation

Running

App Files Files Community

wan2-video-generation / README.md

Smikke

Deploy optimized Wan2.2 video generation with Zero GPU support

d16eb70 verified about 2 months ago

preview code

raw

history blame contribute delete

4.97 kB

	---
	title: Wan2.2 Video Generation
	emoji: 🎥
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	sdk_version: 5.49.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- video-generation
	- text-to-video
	- image-to-video
	- diffusers
	- wan
	- ai-video
	- zero-gpu
	python_version: "3.10"
	---

	# Wan2.2 Video Generation 🎥

	Generate high-quality videos from text prompts or images using the powerful Wan2.2-TI2V-5B model!

	This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology.

	## Features ✨

	- Text-to-Video: Generate videos from descriptive text prompts
	- Image-to-Video: Animate your images by adding an input image
	- High Quality: 720P resolution at 24fps
	- Customizable: Adjust resolution, number of frames, guidance scale, and more
	- Reproducible: Use seeds to recreate your favorite generations

	## Model Information 🤖

	Wan2.2-TI2V-5B is a unified text-to-video and image-to-video generation model with:

	- 5 billion parameters optimized for consumer-grade GPUs
	- 720P resolution support (1280x704 default)
	- 24 fps smooth video output
	- Optimized duration: Default 3 seconds (optimized for Zero GPU limits)

	The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models.

	## How to Use 🚀

	### Text-to-Video Generation

	1. Enter your prompt describing the video you want to create
	2. Adjust settings in "Advanced Settings" if desired
	3. Click "Generate Video"
	4. Wait for generation (typically 2-3 minutes on Zero GPU with default settings)

	### Image-to-Video Generation

	1. Upload an input image
	2. Enter a prompt describing how the image should animate
	3. Click "Generate Video"
	4. The output will maintain the aspect ratio of your input image
	5. Generation takes 2-3 minutes with optimized settings

	## Advanced Settings ⚙️

	- Width/Height: Video resolution (default: 1280x704)
	- Number of Frames: Longer videos need more frames (default: 73 frames ≈ 3 seconds, max: 145)
	- Inference Steps: More steps = better quality but slower (default: 35, optimized for speed)
	- Guidance Scale: How closely to follow the prompt (default: 5.0)
	- Seed: Set a specific seed for reproducible results

	Note: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users.

	## Tips for Best Results 💡

	1. Detailed Prompts: Be specific about what you want to see
	- Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting"
	- Basic: "cats fighting"

	2. Image-to-Video: Use clear, high-quality input images that match your prompt

	3. Quality vs Speed (optimized for Zero GPU limits):
	- Fast: 25-30 steps (~2 minutes)
	- Balanced: 35 steps (default, ~2-3 minutes)
	- Higher Quality: 40-50 steps (~3+ minutes, may timeout)

	4. Experiment: Try different guidance scales:
	- Lower (3-4): More creative, less literal
	- Default (5): Good balance
	- Higher (7-10): Strictly follows prompt

	## Example Prompts 📝

	- "Two anthropomorphic cats in comfy boxing gear fight on stage"
	- "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully"
	- "A bustling futuristic city at night with neon lights and flying cars"
	- "A peaceful mountain landscape with snow-capped peaks and a flowing river"
	- "An astronaut riding a horse through a nebula in deep space"
	- "A dragon flying over a medieval castle at sunset"

	## Technical Details 🔧

	- Model: Wan-AI/Wan2.2-TI2V-5B-Diffusers
	- Framework: Hugging Face Diffusers
	- Backend: PyTorch with bfloat16 precision
	- GPU: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated)
	- GPU Duration: 180 seconds (3 minutes) for Pro users
	- Generation Time: ~2-3 minutes with optimized settings (73 frames, 35 steps)

	## Limitations ⚠️

	- Generation requires compute time (2-3 minutes with default settings)
	- Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free)
	- Videos longer than 6 seconds (145 frames) may timeout
	- Higher quality settings (50+ steps) may timeout on Zero GPU
	- Complex scenes with many objects may be challenging

	## Credits 🙏

	- Model: [Wan-AI](https://huggingface.co/Wan-AI)
	- Original Repository: [Wan2.2](https://github.com/Wan-Video/Wan2.2)
	- Framework: [Hugging Face Diffusers](https://github.com/huggingface/diffusers)

	## License 📄

	This Space uses the Wan2.2 model which is released under Apache 2.0 license.

	## Related Links 🔗

	- [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI)
	- [Original Model Card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
	- [Diffusers Documentation](https://huggingface.co/docs/diffusers)

	---

	Note: This is a community-created Space for easy access to Wan2.2 video generation. Generation times may vary based on current GPU availability.