LLaDA2.1: Speeding Up Text Diffusion via Token Editing Paper • 2602.08676 • Published 7 days ago • 64
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 11 days ago • 314
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published 6 days ago • 27
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published 5 days ago • 174
view article Article Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp 18 days ago • 10
view article Article Fine-Tuning FunctionGemma on TPU to Create a Virtual Fitness Coach in 10 Minutes, $0.50 14 days ago • 13
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output 9 days ago • 18
view article Article Training Qwen3 VL to label bbox : synthetic data, environment and training analysis 7 days ago • 5
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published 10 days ago • 70
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published 8 days ago • 253
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining Paper • 2602.07085 • Published 10 days ago • 180
Shaping capabilities with token-level data filtering Paper • 2601.21571 • Published 18 days ago • 26
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 18 days ago • 42
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Paper • 2601.20833 • Published 19 days ago • 176
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 17 days ago • 99