-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 32
Igor Kilbas
kaleinaNyan
AI & ML interests
Computer Vision, NLP
Organizations
None yet
Kolibri
A series of English/Russian instruction-following models.
Good RL papers
-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 32
JinaJudge
A series of encoder-transformer models for cheap evaluation of LLM on Russian Hard LLM Arena.
Kolibri
A series of English/Russian instruction-following models.
Eule
A series of English/Russian reasoning models.