Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following Paper • 2508.02150 • Published Aug 4, 2025 • 37
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents Paper • 2507.03112 • Published Jul 3, 2025 • 33
Learning to Decode Collaboratively with Multiple Language Models Paper • 2403.03870 • Published Mar 6, 2024 • 21