FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning Paper • 2601.18150 • Published 5 days ago • 5
AACR-Bench: Evaluating Automatic Code Review with Holistic Repository-Level Context Paper • 2601.19494 • Published 4 days ago • 13
Linear representations in language models can change dramatically over a conversation Paper • 2601.20834 • Published 3 days ago • 19
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 3 days ago • 111
ECO: Quantized Training without Full-Precision Master Weights Paper • 2601.22101 • Published 1 day ago • 3
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning Paper • 2601.19001 • Published 4 days ago • 2
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published 2 days ago • 6
Beyond Imitation: Reinforcement Learning for Active Latent Planning Paper • 2601.21598 • Published 2 days ago • 5
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening Paper • 2601.21590 • Published 2 days ago • 10
Language-based Trial and Error Falls Behind in the Era of Experience Paper • 2601.21754 • Published 1 day ago • 13
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 2 days ago • 22
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published 2 days ago • 78
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation Paper • 2601.11258 • Published 15 days ago • 6
One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment Paper • 2601.18731 • Published 5 days ago • 7
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents Paper • 2601.18217 • Published 5 days ago • 9