Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning Paper • 2602.11149 • Published 8 days ago • 12
What Layers When: Learning to Skip Compute in LLMs with Residual Gates Paper • 2510.13876 • Published Oct 13, 2025 • 11 • 2
What Layers When: Learning to Skip Compute in LLMs with Residual Gates Paper • 2510.13876 • Published Oct 13, 2025 • 11
KV Cache Steering for Inducing Reasoning in Small Language Models Paper • 2507.08799 • Published Jul 11, 2025 • 40
KV Cache Steering for Inducing Reasoning in Small Language Models Paper • 2507.08799 • Published Jul 11, 2025 • 40 • 3
KV Cache Steering for Inducing Reasoning in Small Language Models Paper • 2507.08799 • Published Jul 11, 2025 • 40
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation Paper • 2505.06027 • Published May 9, 2025 • 18
Running 3.7k The Ultra-Scale Playbook 🌌 3.7k The ultimate guide to training LLM on large GPU Clusters