APRIL-AIGC/UltraVideo
Viewer β’ Updated β’ 58.8k β’ 3.46k β’ 59
How to use onkarsus13/MMVQVae with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("onkarsus13/MMVQVae", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Official Implementation β WACV 2026
This repository provides the official implementation of the paper:
Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos
Accepted at WACV 2026
We introduce a new autoencoder trained on 4K-resolution video data, featuring a hierarchical frequency-based vector quantization method.
The model leverages a pyramidal spectral representation to produce high-fidelity video reconstructions with an efficient latent structure.
This implementation requires installing Diffusers from the custom branch:
pip install git+https://github.com/Onkarsus13/diffusers@MMVQVae
@inproceedings{pyramidal_spectrum_wacv2026,
title = {Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos},
author = {Tushar, Prakash and Onkar, Susladkar and Inderjit,
Inderjit Dhillon and Sparsh Mittal},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year = {2026}
}