13 1

Keyu Duan

vermouthdky

https://kduan.live

vermouthdky

AI & ML interests

LLM Reasoning and Safety

Recent Activity

authored a paper 1 day ago

In-Context Reinforcement Learning for Tool Use in Large Language Models

upvoted an article 28 days ago

Forge: Scalable Agent RL Framework and Algorithm

upvoted a paper 4 months ago

Diffusion Language Models are Super Data Learners

View all activity

Organizations

authored a paper 1 day ago

In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 4 days ago • 20

upvoted an article 28 days ago

Article

Forge: Scalable Agent RL Framework and Algorithm

28 days ago

•

137

upvoted 3 papers 4 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 129

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 31

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30, 2025 • 86

updated a dataset 5 months ago

axon-rl/webshop_instructions

Viewer • Updated Oct 27, 2025 • 6.91k • 27

published a dataset 5 months ago

axon-rl/webshop_instructions

Viewer • Updated Oct 27, 2025 • 6.91k • 27

updated a dataset 5 months ago

axon-rl/webshop

Viewer • Updated Oct 27, 2025 • 1k • 24

published a dataset 5 months ago

axon-rl/webshop

Viewer • Updated Oct 27, 2025 • 1k • 24

authored 2 papers 5 months ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14, 2025 • 13

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 90

upvoted a paper 5 months ago

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 90

upvoted 2 papers 6 months ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69

upvoted 3 papers 10 months ago

upvoted a paper 11 months ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14, 2025 • 13

updated 2 models 11 months ago

sail/ActPRM-X

7B • Updated Apr 15, 2025 • 4

sail/ActPRM

7B • Updated Apr 15, 2025 • 3

Keyu Duan

AI & ML interests

Recent Activity

Organizations

vermouthdky's activity

Forge: Scalable Agent RL Framework and Algorithm