ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 3 days ago • 31
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26, 2025 • 36
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published Mar 18, 2025 • 22
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published Mar 18, 2025 • 22
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28, 2025 • 31