HUMAN-WRITTEN & LEGALLY-SOURCED* Collection Datasets written by humans and/or reverse-engineered from text with deterministic algorithms. No illegal scraping or unethical synthesis *...mostly. • 161 items • Updated about 14 hours ago • 2
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Paper • 2603.05863 • Published 7 days ago • 4
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards Paper • 2603.09117 • Published 3 days ago • 5
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness Paper • 2603.09200 • Published 3 days ago • 5
Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications Paper • 2603.08806 • Published 4 days ago • 7
Do What I Say: A Spoken Prompt Dataset for Instruction-Following Paper • 2603.09881 • Published 3 days ago • 7
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering Paper • 2603.06854 • Published 7 days ago • 11
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Paper • 2603.09095 • Published 4 days ago • 23
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports Paper • 2603.09896 • Published 3 days ago • 24
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion Paper • 2603.06577 • Published 7 days ago • 43
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 3 days ago • 41
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs Paper • 2603.09906 • Published 3 days ago • 58
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Paper • 2603.10101 • Published 3 days ago • 3
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published 3 days ago • 4