Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling Paper • 2605.05922 • Published 11 days ago • 4 • 2
TIDE: Every Layer Knows the Token Beneath the Context Paper • 2605.06216 • Published 11 days ago • 8 • 3
BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models Paper • 2605.05758 • Published 11 days ago • 4 • 3
EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions Paper • 2602.00095 • Published 18 days ago • 3 • 3
Recovering Hidden Reward in Diffusion-Based Policies Paper • 2605.00623 • Published 17 days ago • 4 • 3
The Scaling Properties of Implicit Deductive Reasoning in Transformers Paper • 2605.04330 • Published 13 days ago • 5 • 3
KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels Paper • 2605.04956 • Published 12 days ago • 7 • 4
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels Paper • 2605.06652 • Published 11 days ago • 5 • 3
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance Paper • 2605.06535 • Published 11 days ago • 3 • 3
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study Paper • 2605.06643 • Published 11 days ago • 4 • 3
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published 15 days ago • 107 • 3
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models Paper • 2605.06196 • Published 11 days ago • 7 • 3
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 11 days ago • 14 • 5
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI Paper • 2605.06651 • Published 11 days ago • 15 • 3
Audio-Visual Intelligence in Large Foundation Models Paper • 2605.04045 • Published 13 days ago • 32 • 3
RemoteZero: Geospatial Reasoning with Zero Human Annotations Paper • 2605.04451 • Published 12 days ago • 8 • 3
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 11 days ago • 42 • 3
SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation Paper • 2605.06356 • Published 11 days ago • 5 • 4
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding Paper • 2605.04962 • Published 12 days ago • 8 • 3
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving Paper • 2605.04647 • Published 12 days ago • 9 • 3