Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes Paper • 2603.25562 • Published about 1 month ago • 13
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published Feb 12 • 63