Running 81 Unlocking On-Policy Distillation for Any Model Family 📝 81 Improve model performance by transferring knowledge between different model families
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch May 7, 2024 • 115
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation • 33B • Updated Feb 24, 2025 • 1.93M • • 1.51k