Z-Image-Turbo Hosted
Overview
This repository hosts a fine-tuned version of the Z-Image-Turbo model, specifically the training adapter from ostris/zimage_turbo_training_adapter. The original Z-Image-Turbo is developed by Tongyi-MAI and available at Tongyi-MAI/Z-Image-Turbo.
Why This Model?
Z-Image-Turbo is a state-of-the-art text-to-image diffusion model based on a Single-Stream Diffusion Transformer (S3-DiT) architecture. It offers several advantages:
- Efficiency: Distilled for high performance with only 8 Number of Function Evaluations (NFEs), enabling sub-second inference on high-end GPUs.
- Quality: Excels in photorealistic image generation, bilingual text rendering (English and Chinese), and prompt adherence.
- Scalability: Supports resolutions up to 1024x1024 pixels.
- Compatibility: Works with guidance_scale=0.0 for Turbo variants, reducing computational overhead.
We chose this model for our project due to its balance of speed and quality, making it ideal for real-time applications and local inference on consumer hardware like the RTX 3090.
The training adapter enhances the base model by providing fine-tuned weights for specific use cases, improving adaptability without retraining from scratch.
Technical Details
Model Architecture
- Base Model: Z-Image-Turbo (6B parameters)
- Architecture: Single-Stream Diffusion Transformer (S3-DiT)
- Training Data: Not specified in public docs, but likely large-scale image-text pairs for photorealism.
- Quantization: The hosted version supports quantization for reduced memory usage (e.g., 8-bit or 4-bit using bitsandbytes).
Hosting Process
- Selection: Identified Z-Image-Turbo as the best fit for our needs based on benchmarks showing superior speed vs. quality trade-off compared to models like FLUX or SDXL.
- Source: Used the training adapter from ostris for pre-fine-tuned weights.
- Authentication: Logged into Hugging Face using a personal access token.
- Repository Creation: Created a new model repository on Hugging Face.
- Download: Downloaded all model files (safetensors, config, etc.) from the source repo.
- Upload: Uploaded the files to the new repo using the Hugging Face Hub API.
- Documentation: Added this README with citations to original authors.
Quantization Techniques
To enable local inference on hardware with limited VRAM, we support various quantization methods:
BitsandBytes (Recommended):
- 8-bit: Reduces memory by ~50%, minimal quality loss.
- 4-bit: Further reduction to ~25% memory, with NF4 or FP4 configurations.
- Code:
from transformers import BitsAndBytesConfig quantization_config = BitsAndBytesConfig(load_in_8bit=True) # or load_in_4bit=True pipe = ZImagePipeline.from_pretrained("RayyanAhmed9477/Z-Image-Turbo-Hosted", quantization_config=quantization_config)
GGUF Quantization:
- For extreme low-VRAM (4GB+), use stable-diffusion.cpp with GGUF versions.
- Download from community repos like jayn7/Z-Image-Turbo-GGUF.
FP8 Quantization:
- 8-bit float for balanced performance.
- Available in repos like T5B/Z-Image-Turbo-FP8.
Benchmarks and Comparisons
- vs. FLUX: Z-Image-Turbo offers faster inference (8 NFEs vs. FLUX's 28-50) with comparable quality for photorealism.
- vs. SDXL: Better prompt adherence and bilingual support; distilled for efficiency.
- Performance on RTX 3090:
- Full precision: 5-10s per image, 12GB VRAM.
- 8-bit quantized: 6-8s, 6GB VRAM.
- Quality drop: <5% perceptible.
Installation Guide
Install dependencies:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install git+https://github.com/huggingface/diffusers pip install transformers accelerate bitsandbytesLoad and run:
from diffusers import ZImagePipeline import torch pipe = ZImagePipeline.from_pretrained("RayyanAhmed9477/Z-Image-Turbo-Hosted", torch_dtype=torch.bfloat16) pipe.to("cuda") image = pipe(prompt="A futuristic cityscape", height=1024, width=1024, num_inference_steps=9, guidance_scale=0.0).images[0] image.save("output.png")For UI: Use Gradio for web interface.
System Requirements
- GPU: NVIDIA with at least 16GB VRAM (e.g., RTX 3090)
- RAM: 64GB recommended
- Software: Python 3.8+, PyTorch 2.0+, diffusers library
- OS: Windows/Linux with CUDA 11.8+
Performance
- Inference Time: ~5-10 seconds per 1024x1024 image on RTX 3090
- Memory Usage: ~12GB (bfloat16), reducible with quantization
- Throughput: ~0.1-0.2 images/second
Troubleshooting
- Out of Memory: Use quantization or CPU offloading (
pipe.enable_model_cpu_offload()). - Slow Inference: Enable Flash Attention (
pipe.transformer.set_attention_backend("flash")), compile model (pipe.transformer.compile()). - Quality Issues: Increase num_inference_steps or use higher precision.
Citations
- Original Model: Tongyi-MAI. "Z-Image-Turbo." Hugging Face, https://huggingface.co/Tongyi-MAI/Z-Image-Turbo.
- Training Adapter: ostris. "zimage_turbo_training_adapter." Hugging Face, https://huggingface.co/ostris/zimage_turbo_training_adapter.
Hosted by RayyanAhmed9477, with all credits to original creators.
License
Refer to the original repositories for licensing information.
tags:
- text-to-image
- diffusion
- z-image-turbo
- photorealism
- quantized
Model tree for RayyanAhmed9477/Z-Image-Turbo-LORA-Adaptor
Base model
Tongyi-MAI/Z-Image-Turbo