Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published 15 days ago • 41
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 23 days ago • 38
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 25 days ago • 95
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 23 days ago • 59