This is a slice of deepseek-ai/DeepSeek-R1 for testing purpose:

  1. n_routed_experts is updated from 256 to 16, i.e. there're 16 experts per layer
  2. only layers.2 (dense MLP) and layers.3 (MoE) are kept

This way the model size is minimized, but people can still use this checkpoint for (model loader, MoE forward path) testing

Downloads last month
131
Safetensors
Model size
3B params
Tensor type
BF16
·
F32
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hugg1ngfac3/deepseek-r1-tiny

Quantized
(67)
this model