HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)

HQ-SVC is an efficient framework for high-quality zero-shot singing voice conversion (SVC) in low-resource scenarios. It achieves disentanglement of content and speaker features via a unified decoupled codec, and enhances synthesis quality through multi-feature fusion and progressive optimization.

Unlike existing methods that demand large datasets or heavy computational resources, HQ-SVC unifies:

  • 🚀 Zero-shot conversion for unseen speakers without fine-tuning
  • ⚡ Low-resource training (single consumer-grade GPU, <80h data)
  • 🎧 Dual capabilities: high-quality singing voice conversion + voice super-resolution
  • 🎯 Superior naturalness and speaker similarity compared to SOTA methods

🗞 News

  • [2025-11-08] 🎉 Paper accepted by AAAI 2026
  • [2025-11-12] 🎉 arXiv paper released
  • [2025-11-12] 🎉 Demo released
  • [2025-12-24] 🎉 Inference codes and pre-trained models released

📅 Release Plan

  • arXiv preprint
  • Online demo
  • Inference codes
  • Pre-trained models
  • Training codes

✨ New features

  • Singing style control
  • Improved quality

🎸 Try Inference

1. Download Codes and Environment(下载代码和环境)

  • Tested only on Linux platforms with CUDA >= 11.8 (仅在 Linux 平台、CUDA >= 11.8 的环境上测试通过)

  • Windows users can use WSL (Ubuntu) for deployment and execution (Windows 用户可以使用 WSL (Ubuntu) 进行部署运行)

git clone https://github.com/ShawnPi233/HQ-SVC.git
cd HQ-SVC
wget -c https://huggingface.co/shawnpi/HQ-SVC/resolve/main/environment.tar.gz
wget -c https://hf-mirror.com/shawnpi/HQ-SVC/resolve/main/environment.tar.gz # 可选镜像源

2. Unzip Environment(解压环境)

mkdir -p venv
tar -xzf environment.tar.gz -C venv

3. Activate Environment(激活环境)

source venv/bin/activate

4. Download Pretrained Models(下载权重)

export HF_HUB_ENABLE_HF_TRANSFER=0
huggingface-cli download shawnpi/HQ-SVC --include "utils/pretrain/*" --local-dir . --local-dir-use-symlinks False

5. Running(运行)

python gradio_app.py
  • If you encounter the error Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) (如果报错 Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)))
  • Please execute the following code before running the above code (请执行以下代码后再启动上述代码)
unset LD_LIBRARY_PATH
sr

Zero-shot Super-Resolution (16 kHz to 44.1 kHz): Input only source audio

svc

Zero-shot Singing Voice Conversion: Input both source audio and target audio

📜 Citation

If you use HQ-SVC in your research, please cite our work:

@article{bai2025hq,
  title={HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios},
  author={Bai, Bingsong and Geng, Yizhong and Wang, Fengping and Wang, Cong and Guo, Puyuan and Gao, Yingming and Li, Ya},
  journal={arXiv preprint arXiv:2511.08496},
  year={2025}
}

🙏 Acknowledgement

We thank the open-source communities behind:

⭐️ Star History

Star History Chart

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support