YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

🌊 LiquidDiffusion

A novel attention-free image generation model based on Liquid Neural Networks

What is this?

LiquidDiffusion is a first-of-its-kind image generation model that replaces attention with Parallel CfC (Closed-form Continuous-depth) blocks from Liquid Neural Network research. No existing paper combines LNNs with image generation β€” this fills that gap.

Key Properties

  • βœ… Zero attention layers β€” fully convolutional + liquid time-gating
  • βœ… Fully parallelizable β€” no ODE solvers, no sequential scanning, no recurrence
  • βœ… Latent space training β€” uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
  • βœ… Fits 16GB VRAM β€” tiny config runs 256px at batch=8 on T4 GPU
  • βœ… Simple training β€” Rectified Flow (MSE velocity prediction, no noise schedule)
  • βœ… 6 verified datasets β€” all tested and working with streaming support

Quick Start (Colab)

  1. Open LiquidDiffusion_Training.ipynb in Colab
  2. Select GPU runtime (T4)
  3. Pick a dataset from the dropdown (default: huggan/AFHQv2 β€” animal faces)
  4. Run all cells β†’ training starts, samples generated every 500 steps

Architecture

Pixel Image (3Γ—256Γ—256)
    β†’ [Frozen SD-VAE Encode] β†’ Latent (4Γ—32Γ—32)
    β†’ [LiquidDiffusion U-Net] β†’ Velocity prediction (4Γ—32Γ—32)
    β†’ [Frozen SD-VAE Decode] β†’ Generated Image (3Γ—256Γ—256)

Each LiquidDiffusionBlock contains:

  1. AdaLN β€” timestep conditioning via learned scale/shift
  2. ParallelCfCBlock β€” the core liquid neural network layer (CfC Eq.10)
  3. MultiScaleSpatialMix β€” 3Γ—3+5Γ—5+7Γ—7 depthwise conv + global pooling (replaces attention)
  4. FeedForward β€” channel mixing via 1Γ—1 conv

The ParallelCfC Block

# CfC Eq.10 adapted for images:
gate = Οƒ(time_a(t_emb) Β· f(features) - time_b(t_emb))   # liquid time-gating
out = gate Β· g(features) + (1 - gate) Β· h(features)       # CfC interpolation
Ξ± = exp(-Ξ» Β· |t_emb|)                                     # liquid relaxation
output = Ξ± Β· input + (1 - Ξ±) Β· out                         # time-aware residual

Verified Datasets

All tested and working (with streaming support):

Dataset Images Description Native Resolution
huggan/AFHQv2 16K Animal faces (cats, dogs, wildlife) 512Γ—512
nielsr/CelebA-faces 202K Celebrity faces 178Γ—218
huggan/flowers-102-categories 8K Flower photographs Variable
reach-vb/pokemon-blip-captions 833 Pokemon illustrations 1280Γ—1280
huggan/anime-faces 63K Anime faces 64Γ—64
Norod78/cartoon-blip-captions ~3K Cartoon characters 512Γ—512

VAE

Uses stabilityai/sd-vae-ft-mse (83.7M params, frozen during training):

  • 4 latent channels, 8Γ— spatial downscale
  • PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
  • ~160MB VRAM in fp16
  • Scaling factor: 0.18215

Model Configs

Config Params 256px VRAM (w/ VAE) 512px VRAM
tiny ~23M ~6 GB ~12 GB
small ~69M ~10 GB ~20 GB
base ~154M ~16 GB ~30 GB

Training

Objective: Rectified Flow β€” simple MSE on velocity

x_t = (1 - t) Β· x0 + t Β· noise     # linear interpolation
v_target = noise - x0                # constant velocity
loss = MSE(model(x_t, t), v_target)  # that's it!

Sampling: Euler ODE integration, 25-50 steps

References

Paper Contribution
CfC Networks (Nature MI 2022) CfC Eq.10, parallelizable closed-form
LTC Networks (AAAI 2021) Liquid time-constant ODE
LiquidTAD (2024) Parallel liquid relaxation
USM (CVPR 2025) U-Net + SSM for diffusion
DiffuSSM (2023) SSM replaces attention in diffusion
Rectified Flow (ICLR 2023) Simple velocity training

Files

β”œβ”€β”€ liquid_diffusion/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ model.py             # Full model architecture
β”‚   └── trainer.py           # Trainer + dataset utilities
β”œβ”€β”€ LiquidDiffusion_Training.ipynb  # Complete Colab notebook
β”œβ”€β”€ test_model.py
└── README.md

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for krystv/liquid-diffusion