-
inference-optimization/test_tencentbac_fastmtp
Updated • 43 -
inference-optimization/test_qwen3_next_mtp
Updated • 47 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
inference-optimization/test_tencentbac_fastmtp
Updated • 43 -
inference-optimization/test_qwen3_next_mtp
Updated • 47 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
models 205
inference-optimization/Qwen3-30B-from-Qwen3-235B_resps-speculators.eagle3-ckpt3
0.5B • Updated
inference-optimization/gpt-oss-120b-from-self-ckpt5-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-self-ckpt4-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-self-ckpt0-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-self-ckpt1-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-self-ckpt2-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-self-ckpt3-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-qwen235b-ckpt3-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-qwen235b-ckpt1-speculator.eagle3
0.9B • Updated
inference-optimization/gpt-oss-120b-from-qwen235b-ckpt0-speculator.eagle3
0.9B • Updated