This is a slice of deepseek-ai/DeepSeek-R1 for testing purpose:
- n_routed_experts is updated from 256 to 16, i.e. there're 16 experts per layer
- only layers.2 (dense MLP) and layers.3 (MoE) are kept
This way the model size is minimized, but people can still use this checkpoint for (model loader, MoE forward path) testing
- Downloads last month
- 131
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for hugg1ngfac3/deepseek-r1-tiny
Base model
deepseek-ai/DeepSeek-R1