AMD support
Is AMD support possibly coming? I know the main team doesn't have any consumer AMD cards. On comfyui, you have to run it with --cpu_vae, and that takes way too long.
maybe try without using comfyui
maybe try without using comfyui
Yeah, I didn't succeed getting the gradio thing to work, but I think you have to be a gradio genius to get it to work with amd lol so that may not mean it can't work!
What do you mean, the Gradio setup is relatively straightforward
Is AMD support possibly coming? I know the main team doesn't have any consumer AMD cards. On comfyui, you have to run it with --cpu_vae, and that takes way too long.
I'm running this on AMD Strix Halo using ROCm7.2:
https://github.com/IgnatBeresnev/comfyui-gfx1151
In case you have some other AMD GPU, it may be compatible or at least you can use this container to adapt.
The required workflow is already provided within ComfyUI / Workflows / ACE-Step 1.5 Music Generation AIO
"Standard Installation (All Platforms)" on their github page worked no problem. And it even runs in cpu mode only. I don't mind waiting several minutes for this goodness.
And it's even better like this, because i can use my computer while its generating, cpu only 50% usage, and no vram usage
GRadio works fine on RX7900 XTX AMD (24gb vram)on Ubuntu 24.04 LTS. Just create a venv in python3.11, install torch and torchaudio from pytorch.org ROCm7.1 nightly and after that the requirements.txt of the git repository. Once done you can run python -m acestep.acestep_v15_pipeline --server-name 127.0.0.1 --port 7680 and open it in the browser localhost:7680.
torchcodec must be compiled ( see: https://github.com/vllm-project/vllm/blob/main/tools/install_torchcodec_rocm.sh ) manually if you want to train your own LoRa. I get some errors with training loras atm:
RuntimeError: Input type (CUDABFloat16Type) and weight type (CPUBFloat16Type) should be the same
EDIT: turning off offload to CPU to have everything in the VRAM was the solutions.
These are the commands I run, or you can put them in a bash script, looks like the whole lora training works on AMD now, besides generating AI music :)
Also had to install python3.11-dev with apt install.
set -e
export TORCHCODEC_USE_SYSTEM_FFMPEG=1
export TORCH_CUDNN_ENABLED=0
export PYTORCH_ALLOC_CONF=expandable_segments:True
export PYTORCH_NO_HIP_MEMORY_CACHING=1
venv/bin/python -m acestep.acestep_v15_pipeline
--server-name 0.0.0.0
--port 7680
--init_service True
--device cuda
--backend pt
--offload_dit_to_cpu True
--offload_to_cpu False
Lora training pre-processing of the dataset builder tab of two songs took half an hour, training the lora probably will be finished after 1h30m for 1000 steps.
Nie to już koniec !