Please release a model with native 4-bit quantization
Since Kimi currently provides an INT4 quantized model, could you deliver a model that is natively quantized to 4-bit precision?
For practical deployment scenarios, many people rely on native 4-bit models for memory and throughput efficiency. Given that Kimi-K2.5 already ships an INT4 variant, it would be helpful if GLM-5 also offered a natively quantized 4-bit model so benchmarks can be compared under equivalent conditions.
Agreed. This model now no longer fits within 8xH100 and 4xH200 setups at FP8
Indeed, native 4-bit QAT (like in case of Kimi) I think provide the best quality and size ratio, better than post-training quantization. Great release nonetheless! I guess I will have to wait a bit for 4-bit quants to be able to run on my hardware.
Would love this in nvfp4