File size: 3,067 Bytes
bd36100 03658f4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
license: apache-2.0
library_name: videox_fun
---
# Z-Image-Turbo-Fun-Controlnet-Union
[](https://github.com/aigc-apps/VideoX-Fun)
## Model Features
- This ControlNet is added on 6 blocks.
- The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
- It supports multiple control conditionsโincluding Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
- You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.
## TODO
- [ ] Train on more data and for more steps.
- [ ] Support inpaint mode.
## Results
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Pose</td>
<td>Output</td>
</tr>
<tr>
<td><img src="asset/pose2.jpg" width="100%" /></td>
<td><img src="results/pose2.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Pose</td>
<td>Output</td>
</tr>
<tr>
<td><img src="asset/pose.jpg" width="100%" /></td>
<td><img src="results/pose.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Canny</td>
<td>Output</td>
</tr>
<tr>
<td><img src="asset/canny.jpg" width="100%" /></td>
<td><img src="results/canny.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>HED</td>
<td>Output</td>
</tr>
<tr>
<td><img src="asset/hed.jpg" width="100%" /></td>
<td><img src="results/hed.png" width="100%" /></td>
</tr>
</table>
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
<tr>
<td>Depth</td>
<td>Output</td>
</tr>
<tr>
<td><img src="asset/depth.jpg" width="100%" /></td>
<td><img src="results/depth.png" width="100%" /></td>
</tr>
</table>
## Inference
Go to the VideoX-Fun repository for more details.
Please clone the VideoX-Fun repository and create the required directories:
```sh
# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git
# Enter VideoX-Fun's directory
cd VideoX-Fun
# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model
```
Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
```
๐ฆ models/
โโโ ๐ Diffusion_Transformer/
โ โโโ ๐ Z-Image-Turbo/
โโโ ๐ Personalized_Model/
โ โโโ ๐ฆ Z-Image-Turbo-Fun-Controlnet-Union.safetensors
```
Then run the file `examples/z_image_fun/predict_t2i_control.py`. |