AI & ML interests
post-training, multimodal large language models, generalization
Organizations
None yet
-
-
-
-
-
-
-
-
-
-
-
upvoted a paper 4 months ago view article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
upvoted a paper almost 2 years ago