Beyond Imitation: Reinforcement Learning for Active Latent Planning Paper • 2601.21598 • Published 11 days ago • 9
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization Paper • 2511.06411 • Published Nov 9, 2025 • 18