In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 4 days ago • 20
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30, 2025 • 86
Efficient Process Reward Model Training via Active Learning Paper • 2504.10559 • Published Apr 14, 2025 • 13
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
Efficient Process Reward Model Training via Active Learning Paper • 2504.10559 • Published Apr 14, 2025 • 13