wan2-video-generation / DEPLOYMENT.md
Smikke's picture
Deploy optimized Wan2.2 video generation with Zero GPU support
d16eb70 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Deployment Guide for Wan2.2 on Hugging Face Spaces

This guide explains how to deploy the Wan2.2 video generation model to Hugging Face Spaces with Zero GPU support.

Prerequisites

  1. A Hugging Face account (create one at https://huggingface.co/join)
  2. Git installed on your local machine
  3. Git LFS (Large File Storage) installed

Deployment Steps

Option 1: Deploy via Hugging Face Web Interface

  1. Create a New Space

    • Go to https://huggingface.co/new-space
    • Choose a name for your Space (e.g., "wan2-video-gen")
    • Select "Gradio" as the SDK
    • Choose "Public" or "Private" visibility
    • Click "Create Space"
  2. Upload Files

    • Use the web interface to upload files:
      • app.py
      • requirements.txt
      • README.md
      • .gitignore
  3. Enable Zero GPU

    • In your Space settings, enable "Zero GPU"
    • This provides automatic GPU allocation during inference
  4. Wait for Build

    • Hugging Face will automatically build your Space
    • This may take 10-15 minutes for the first build
    • Check the build logs for any errors

Option 2: Deploy via Git (Recommended)

  1. Clone Your Space

    git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
    
  2. Copy Files

    # Copy all files from huggingface-wan2.2 directory
    cp /path/to/huggingface-wan2.2/* .
    
  3. Commit and Push

    git add .
    git commit -m "Initial deployment of Wan2.2 video generation"
    git push
    
  4. Enable Zero GPU

    • Go to your Space settings on Hugging Face
    • Navigate to "Settings" β†’ "Zero GPU"
    • Enable Zero GPU support

Option 3: Deploy from This Repository

If you've already cloned this repository:

cd /home/user/Kakka/huggingface-wan2.2

# Initialize git if not already done
git init

# Add Hugging Face Space as remote
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

# Commit files
git add .
git commit -m "Initial deployment of Wan2.2 video generation"

# Push to Hugging Face
git push hf main

Configuration

Zero GPU Settings

The app is configured to use Zero GPU with the following settings:

  • Duration: 180 seconds (3 minutes) per generation
  • Allocation: Automatic (triggered by generation request)
  • Optimized defaults: Reduced frames (73) and steps (35) to fit within time limit

This is configured in app.py with the decorator:

@spaces.GPU(duration=180)  # 3 minutes max for Pro accounts

Important: Even with Pro subscription, the maximum GPU duration is limited to 180 seconds (3 minutes). The default settings have been optimized to complete generation within this time:

  • Default frames: 73 (3 seconds of video at 24fps)
  • Default inference steps: 35 (balanced speed/quality)
  • Maximum frames slider: 145 (6 seconds)
  • Maximum inference steps: 60

Memory Requirements

The Wan2.2-TI2V-5B model requires:

  • Minimum: 24GB VRAM
  • Recommended: 40GB+ VRAM for Zero GPU

Zero GPU on Hugging Face Spaces provides sufficient VRAM for this model (H200 GPU with 70GB).

Testing Your Deployment

  1. Wait for Build to Complete

    • Check the build logs in your Space
    • Wait for "Running" status
  2. Test Basic Generation

    • Try the default example: "Two anthropomorphic cats in comfy boxing gear fight on stage"
    • Generation should take 5-10 minutes
  3. Test Image-to-Video

    • Upload a test image
    • Add a descriptive prompt
    • Verify video generation works

Troubleshooting

Critical: Import Order Issue

Issue: RuntimeError: CUDA has been initialized before importing the 'spaces' package

Solution: This is CRITICAL! The spaces package MUST be imported BEFORE any CUDA-related packages (torch, diffusers, etc.)

Correct import order in app.py:

# IMPORTANT: spaces must be imported first
import spaces

# Standard library imports
import os

# Third-party imports (non-CUDA)
import numpy as np
from PIL import Image
import gradio as gr

# CUDA-related imports (must come after spaces)
import torch
from diffusers import WanPipeline, AutoencoderKLWan

Why this matters: Hugging Face Zero GPU needs to manage CUDA initialization. If torch or other CUDA libraries initialize CUDA before spaces is imported, Zero GPU cannot properly manage GPU allocation.

Build Fails

Issue: Requirements installation fails

  • Solution: Check requirements.txt for compatibility issues
  • Ensure PyTorch version is compatible with CUDA on Zero GPU
  • Make sure using latest Gradio version (5.49.0+) for security

Issue: Out of memory during build

  • Solution: Zero GPU should have enough memory; check model loading code

Issue: "Can't initialize NVML" warnings

  • Solution: These are normal in Zero GPU environment during build time
  • They should not affect runtime when GPU is allocated

Runtime Errors

Issue: "CUDA out of memory"

  • Solution: Reduce num_frames or image resolution
  • Check if Zero GPU is properly enabled in settings

Issue: "Model not found"

  • Solution: Verify internet connection for model download
  • Check Hugging Face Hub status

Issue: Generation timeout

  • Solution: Reduce inference steps or video length
  • Increase GPU duration in @spaces.GPU(duration=XX)

Issue: Gradio security vulnerability warning

  • Solution: Update to Gradio 5.49.0 or later in requirements.txt
  • Check README.md YAML front matter has correct sdk_version: 5.49.0

Issue: "ZeroGPU illegal duration! The requested GPU duration (Xs) is larger than the maximum allowed"

  • Solution: Reduce the duration parameter in @spaces.GPU(duration=XX)
  • For Pro accounts, use 180 seconds or less: @spaces.GPU(duration=180)
  • Free tier typically limited to 60 seconds
  • Optimize your default settings to complete within the time limit:
    • Reduce num_frames (e.g., 73 for 3 seconds instead of 121 for 5 seconds)
    • Reduce num_inference_steps (e.g., 35 instead of 50)

Slow Generation

Issue: Generation takes too long

  • Solution: This is expected; video generation is compute-intensive
  • Typical time: 2-3 minutes for 3-second video with optimized settings (73 frames, 35 steps)
  • Consider reducing num_inference_steps to 25-30 for faster (but lower quality) results
  • Note: Must complete within 180 seconds (3 minutes) for Pro, 60 seconds for Free tier

Optimization Tips

  1. Current Optimized Settings

    • Already optimized: num_frames=73 (3 seconds) and num_inference_steps=35
    • These settings are designed to complete within 180-second Zero GPU limit
    • For even faster testing, reduce steps to 25-30
  2. Add Caching (Optional)

    • Enable example caching with cache_examples=True to pre-generate examples
    • Note: This increases build time and storage requirements
    • Current setting: cache_examples=False for faster builds
  3. Queue Management

    • Current setting: demo.queue(max_size=20)
    • Adjust based on expected traffic
    • Larger queue = more concurrent users but more resource usage

Customization

Change Default Model

To use a different Wan2.2 variant, modify app.py:

# For larger model with better quality
MODEL_ID = "Wan-AI/Wan2.2-T2V-A14B-Diffusers"

# For image-to-video focused
MODEL_ID = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"

Adjust UI

Modify the Gradio interface in app.py:

  • Change default values in sliders
  • Add more examples
  • Customize theme and styling

Add Features

Consider adding:

  • Video upscaling
  • Multiple video outputs
  • Batch generation
  • Download history
  • Custom aspect ratios

Monitoring

Check Space Status

  • Visit your Space URL
  • Check "Settings" β†’ "Logs" for runtime logs
  • Monitor usage in "Settings" β†’ "Analytics"

Usage Limits

Zero GPU on Hugging Face has:

  • Time limits per session
  • Concurrent user limits
  • Monthly compute quotas (check your tier)

Support

If you encounter issues:

  1. Check Logs: Space logs often contain error details
  2. Hugging Face Forums: https://discuss.huggingface.co/
  3. Model Issues: Report at Wan-AI's GitHub or model card
  4. Space Settings: Verify Zero GPU is enabled and quota is available

License

This deployment uses:

  • Wan2.2 model (Apache 2.0)
  • Gradio (Apache 2.0)
  • Diffusers (Apache 2.0)

Ensure compliance with all licenses when deploying.


Happy Deploying! πŸš€