🧠Smoller-reason2.1🧠

I have found making Andy-4-micro that a 1.5b model can learn a lot of stuff really well, if you give it the right environment. So, I have decided to take Qwen2.5 1.5b, and make it a reasoning model using GRPO as well as stuff from DeepSeek-R1 and QwQ in PPO training.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support