🧠Smoller-reason2.1🧠
I have found making Andy-4-micro that a 1.5b model can learn a lot of stuff really well, if you give it the right environment. So, I have decided to take Qwen2.5 1.5b, and make it a reasoning model using GRPO as well as stuff from DeepSeek-R1 and QwQ in PPO training.