YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SCFDepth: A Single-Step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation
This repository is based on Marigold, CVPR 2024 Best Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Haruko386, Shuai Yuan, Mingbo Lei Yibo Chen
We present SCFDepth, a diffusion model, and associated fine-tuning protocol for monocular depth estimation. Based on Marigold. Its core innovation lies in addressing the deficiency of diffusion models in feature representation capability. Our model followed Marigold, derived from Stable Diffusion and fine-tuned with synthetic data: Hypersim and VKitti, achieved ideal results in object edge refinement.
๐ข News
- 2025-10-25: Inspired by DepthMaster, we propose a two-stage loss function training strategy based on
Apepth V1-0. In the first stage, we perform foundational training using MSE loss. In the second stage, we learn edge structures through FFT loss. Based on this, we introduce Apepth V1-1. - 2025-10-09: We propose a novel diffusion-based deep estimation framework guided by pre-trained models.
- 2025-09-23: We change Marigold from Stochastic multi-step generation to Deterministic one-step perception
- 2025-08-10: Trying to make some optimizations in Feature Expression
- 2025-05-08: Clone Marigold to local.
๐ Usage
We offer several ways to interact with SCFDepth:
Local development instructions with this codebase are given below.
๐ ๏ธ Setup
The Model was trained on:
- Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8,
NVIDIA RTX 6000 Ada Generation
The inference code was tested on:
- Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8,
NVIDIA GeForce RTX 4090
๐ชง A Note for Windows users
We recommend running the code in WSL2:
- Install WSL following installation guide.
- Install CUDA support for WSL following installation guide.
- Find your drives in
/mnt/<drive letter>/; check WSL FAQ for more details. Navigate to the working directory of choice.
๐ฆ Repository
Clone the repository (requires git):
git clone https://github.com/Dimon0000000/SCFDepth.git
cd SCFDepth
๐ป Dependencies
Using Conda: Alternatively, create a Python native virtual environment and install dependencies into it:
conda create -n SCFDepth python==3.12.9
conda activate SCFDepth
pip install -r requirements.txt
Keep the environment activated before running the inference script. Activate the environment again after restarting the terminal session.
๐ Testing on your images
๐ท Prepare images
Use selected images under
inputOr place your images in a directory, for example, under
input/test-image, and run the following inference command.
๐ฎ Run inference with paper setting
This setting corresponds to our paper. For academic comparison, please run with this setting.
python run.py \
--checkpoint prs-eth/marigold-v1-0 \
--ensemble_size 1 \
--input_rgb_dir input/in-the-wild_example \
--output_dir output/in-the-wild_example
You can find all results in output/in-the-wild_example. Enjoy!
โ๏ธ Inference settings
The default settings are optimized for the best result. However, the behavior of the code can be customized:
Trade-offs between the accuracy and speed (for both options, larger values result in better accuracy at the cost of slower inference.)
--ensemble_size: Number of inference passes in the ensemble.
By default, the inference script resizes input images to the processing resolution, and then resizes the prediction back to the original resolution. This gives the best quality, as Stable Diffusion, from which SCFDepth is derived, performs best at 768x768 resolution.
--processing_res: the processing resolution; set as 0 to process the input resolution directly. When unassigned (None), will read default setting from model config. Default:768None.--output_processing_res: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.--resample_method: the resampling method used to resize images and depth predictions. This can be one ofbilinear,bicubic, ornearest. Default:bilinear.
--half_precisionor--fp16: Run with half-precision (16-bit float) to have faster speed and reduced VRAM usage, but might lead to suboptimal results.--seed: Random seed can be set to ensure additional reproducibility. Default: None (unseeded). Note: forcing--batch_size 1helps to increase reproducibility. To ensure full reproducibility, deterministic mode needs to be used.--batch_size: Batch size of repeated inference. Default: 0 (best value determined automatically).--color_map: Colormap used to colorize the depth prediction. Default: Spectral. Set toNoneto skip colored depth map generation.--apple_silicon: Use Apple Silicon MPS acceleration.
โฌ Checkpoint cache
By default, the checkpoint is stored in the Hugging Face cache.
The HF_HOME environment variable defines its location and can be overridden, e.g.:
export HF_HOME=$(pwd)/cache
At inference, specify the checkpoint path:
python run.py \
--checkpoint checkpoints/SCFDepth \
--ensemble_size 1 \
--input_rgb_dir input/in-the-wild_example\
--output_dir output/in-the-wild_example
๐ฆฟ Evaluation on test datasets
Install additional dependencies:
pip install -r requirements+.txt -r requirements.txt
Set data directory variable (also needed in evaluation scripts) and download evaluation datasets into corresponding subfolders:
export BASE_DATA_DIR=<YOUR_DATA_DIR> # Set target data directory
wget -r -np -nH --cut-dirs=4 -R "index.html*" -P ${BASE_DATA_DIR} https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/
Run inference and evaluation scripts, for example:
# Run inference
bash script/eval/11_infer_nyu.sh
# Evaluate predictions
bash script/eval/12_eval_nyu.sh
Alternatively, use the following script to evaluate all datasets.
bash script/eval/00_test_all.sh
You can get the result under output/eval
Although the seed has been set, the results might still be slightly different on different hardware.
Evaluating results
Only the U-Net is updated and saved during training. To use the inference pipeline with your training result, replace unet folder in train_SCFDepth checkpoints with that in the checkpoint output folder. Then refer to this section for evaluation.
Although random seeds have been set, the training result might be slightly different on different hardwares. It's recommended to train without interruption.
โ๏ธ Contributing
Please refer to this instruction.
๐ค Troubleshooting
| Problem | Solution |
|---|---|
| (Windows) Invalid DOS bash script on WSL | Run dos2unix <script_name> to convert script format |
(Windows) error on WSL: Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory |
Run export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH |
๐ Citation
Please cite our paper:
@InProceedings{haruko26scfdepth,
title={SCFDepth: A Single-Step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation},
author={Haruko386 and Yuan Shuai},
booktitle = {Under review},
year={2026}
}
๐ซ License
This work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).
By downloading and using the code and model you agree to the terms in the LICENSE.
- Downloads last month
- 15
