Request quantization of VLM

#2
by win10 - opened

I'll see what I can do on some of those. I've actually been working on ERNE 4.5 VL 28B A3B for a while but there are some problems still running it through llm compressor. I've got a issue in on their github about it. I haven't managed to get past the latest tracing problem yet.

We've made a lot of progress on Qwen3-VL-32B-Thinking. I was able to quantize it and with some fixes from the VLLM team it's currently possible to run it.

https://huggingface.co/Firworks/Qwen3-VL-32B-Thinking-nvfp4

I'll update the model card with the latest command to run it once the VLLM updates get officially released but if you want to try it now there's a github issue with details to get it running now with a local copy of VLLM.

https://github.com/vllm-project/vllm/issues/29715

We've made a lot of progress on Qwen3-VL-32B-Thinking. I was able to quantize it and with some fixes from the VLLM team it's currently possible to run it.

https://huggingface.co/Firworks/Qwen3-VL-32B-Thinking-nvfp4

I'll update the model card with the latest command to run it once the VLLM updates get officially released but if you want to try it now there's a github issue with details to get it running now with a local copy of VLLM.

https://github.com/vllm-project/vllm/issues/29715

Great! Let me give it a try.

Sign up or log in to comment