Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
Cache-to-Cache: Direct Semantic Communication Between Large Language
Models
Paper
• 2510.03215
• Published
• 98
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper
• 2510.07499
• Published
• 48
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Paper
• 2510.09608
• Published
• 51
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
Paper
• 2510.14211
• Published
• 9
Every Attention Matters: An Efficient Hybrid Architecture for
Long-Context Reasoning
Paper
• 2510.19338
• Published
• 115
LightMem: Lightweight and Efficient Memory-Augmented Generation
Paper
• 2510.18866
• Published
• 114
Glyph: Scaling Context Windows via Visual-Text Compression
Paper
• 2510.17800
• Published
• 68
DeepSeek-OCR: Contexts Optical Compression
Paper
• 2510.18234
• Published
• 93
Deep Self-Evolving Reasoning
Paper
• 2510.17498
• Published
• 12
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal
Reasoning in MLLMs
Paper
• 2510.24514
• Published
• 22
The End of Manual Decoding: Towards Truly End-to-End Language Models
Paper
• 2510.26697
• Published
• 117
Exploring Conditions for Diffusion models in Robotic Control
Paper
• 2510.15510
• Published
• 40
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper
• 2510.26692
• Published
• 127
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta
Correction
Paper
• 2505.11254
• Published
• 48