Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20, 2025 • 122
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data Paper • 2510.09781 • Published Oct 10, 2025 • 26
view article Article BigCodeArena: Judging code generations end to end with code executions Oct 7, 2025 • 19
CoDA: Agentic Systems for Collaborative Data Visualization Paper • 2510.03194 • Published Oct 3, 2025 • 28
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Paper • 2510.01179 • Published Oct 1, 2025 • 25
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 174
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Paper • 2505.23977 • Published May 29, 2025 • 10
Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach Paper • 2505.18882 • Published May 24, 2025 • 14
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Paper • 2505.14625 • Published May 20, 2025 • 13
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding Paper • 2503.02951 • Published Mar 4, 2025 • 33
KodCode-V1 Collection KodCode-V1 is the largest fully-synthetic open-source dataset providing verifiable solutions and tests for coding tasks. • 6 items • Updated Apr 2, 2025 • 5
Small Models Struggle to Learn from Strong Reasoners Paper • 2502.12143 • Published Feb 17, 2025 • 39
Magpie Reasoning Datasets Collection Reasoning datasets built by Magpie and its friends! • 8 items • Updated Jan 27, 2025 • 11
view article Article Fine-tune a SmolLM on domain-specific synthetic data from a LLM Jan 3, 2025 • 37