Lost in the Noise: How Reasoning Models Fail with Contextual Distractors Paper • 2601.07226 • Published 21 days ago • 32
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning Paper • 2601.03872 • Published 26 days ago • 42
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment Paper • 2601.01576 • Published 29 days ago • 18
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 28 days ago • 10
OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs Paper • 2601.01592 • Published 29 days ago • 12
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models Paper • 2601.01321 • Published 29 days ago • 18
Confidence Estimation for LLMs in Multi-turn Interactions Paper • 2601.02179 • Published 28 days ago • 16
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving Paper • 2601.00747 • Published about 1 month ago • 20
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving Paper • 2601.01426 • Published 29 days ago • 22
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning Paper • 2512.23412 • Published Dec 29, 2025 • 39
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning Paper • 2512.24330 • Published Dec 30, 2025 • 35
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization Paper • 2601.01554 • Published 29 days ago • 57
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119