Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation Paper • 2602.14469 • Published 17 days ago • 2
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Paper • 2602.13367 • Published 20 days ago • 31
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 28 days ago • 343
SWE-World: Building Software Engineering Agents in Docker-Free Environments Paper • 2602.03419 • Published 30 days ago • 40
Nemotron-Cascade Collection Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models • 14 items • Updated 1 day ago • 53