On Data Engineering for Scaling LLM Terminal Capabilities Paper β’ 2602.21193 β’ Published 3 days ago β’ 87
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper β’ 2602.17684 β’ Published 23 days ago β’ 21
Rethinking the Trust Region in LLM Reinforcement Learning Paper β’ 2602.04879 β’ Published 23 days ago β’ 35
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition Paper β’ 2307.13269 β’ Published Jul 25, 2023 β’ 34
Diffusion Language Models are Super Data Learners Paper β’ 2511.03276 β’ Published Nov 5, 2025 β’ 129
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper β’ 2509.22638 β’ Published Sep 26, 2025 β’ 70
cwm Collection Collection for Code World Model, an agentic coding model from FAIR. β’ 3 items β’ Updated Sep 24, 2025 β’ 18
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper β’ 2506.20512 β’ Published Jun 25, 2025 β’ 47
Reinforcing General Reasoning without Verifiers Paper β’ 2505.21493 β’ Published May 27, 2025 β’ 26
Fostering Video Reasoning via Next-Event Prediction Paper β’ 2505.22457 β’ Published May 28, 2025 β’ 29
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper β’ 2505.13438 β’ Published May 19, 2025 β’ 36
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper β’ 2505.13227 β’ Published May 19, 2025 β’ 45
view article Article Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick Oct 24, 2024 β’ 14