Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published 1 day ago • 80
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies Paper • 2602.09877 • Published 3 days ago • 169
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published 1 day ago • 53
ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation Paper • 2602.11598 • Published 1 day ago • 2
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling Paper • 2602.11761 • Published 1 day ago • 5
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling Paper • 2602.12116 • Published 1 day ago • 3
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published 1 day ago • 59
ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces Paper • 2602.11683 • Published 1 day ago • 5
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning Paper • 2602.11748 • Published 1 day ago • 24
LawThinker: A Deep Research Legal Agent in Dynamic Environments Paper • 2602.12056 • Published 1 day ago • 31
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published 1 day ago • 33
MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models Paper • 2602.10934 • Published 2 days ago • 43
Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation Paper • 2602.05548 • Published 8 days ago • 10
Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards Paper • 2602.10231 • Published 3 days ago • 11
EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies Paper • 2602.09514 • Published 4 days ago • 9
Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation Paper • 2602.10699 • Published 3 days ago • 1
Free(): Learning to Forget in Malloc-Only Reasoning Models Paper • 2602.08030 • Published 5 days ago • 5
Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models Paper • 2602.07106 • Published 7 days ago • 11