Useful Memories Become Faulty When Continuously Updated by LLMs Paper • 2605.12978 • Published 4 days ago • 18
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published 6 days ago • 72
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 17 days ago • 213
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph Paper • 2511.00086 • Published Oct 29, 2025 • 42
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 137
MIRIX: Multi-Agent Memory System for LLM-Based Agents Paper • 2507.07957 • Published Jul 10, 2025 • 80
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6, 2025 • 73
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30, 2025 • 97