Request quantization of VLM

by win10 - opened 18 days ago

Discussion

win10

18 days ago

Requesting VLM quantization:

https://huggingface.co/zai-org/GLM-4.5V

https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking

https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct

https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking

Firworks

Owner 16 days ago

I'll see what I can do on some of those. I've actually been working on ERNE 4.5 VL 28B A3B for a while but there are some problems still running it through llm compressor. I've got a issue in on their github about it. I haven't managed to get past the latest tracing problem yet.

Firworks

Owner 8 days ago

We've made a lot of progress on Qwen3-VL-32B-Thinking. I was able to quantize it and with some fixes from the VLLM team it's currently possible to run it.

https://huggingface.co/Firworks/Qwen3-VL-32B-Thinking-nvfp4

I'll update the model card with the latest command to run it once the VLLM updates get officially released but if you want to try it now there's a github issue with details to get it running now with a local copy of VLLM.

https://github.com/vllm-project/vllm/issues/29715

win10

8 days ago

We've made a lot of progress on Qwen3-VL-32B-Thinking. I was able to quantize it and with some fixes from the VLLM team it's currently possible to run it.

https://huggingface.co/Firworks/Qwen3-VL-32B-Thinking-nvfp4

I'll update the model card with the latest command to run it once the VLLM updates get officially released but if you want to try it now there's a github issue with details to get it running now with a local copy of VLLM.

https://github.com/vllm-project/vllm/issues/29715

Great! Let me give it a try.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment