Topology-Preserving Neural Operator Learning via Hodge Decomposition Paper • 2605.13834 • Published 3 days ago • 3
STALE: Can LLM Agents Know When Their Memories Are No Longer Valid? Paper • 2605.06527 • Published 9 days ago • 37
Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning Paper • 2605.14386 • Published 2 days ago • 50
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 3 days ago • 133
Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon Paper • 2605.09708 • Published 6 days ago • 4
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation Paper • 2605.10813 • Published 5 days ago • 12
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 7 days ago • 76
Virus-Human-Host-Signals Collection Collection of datasets and model weights I've put together to develop genomic host detection models. • 4 items • Updated 9 days ago
SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment Paper • 2605.04012 • Published 11 days ago • 11
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published 12 days ago • 114
Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark Paper • 2604.27582 • Published 16 days ago • 4
Hallucinations Undermine Trust; Metacognition is a Way Forward Paper • 2605.01428 • Published 14 days ago • 23
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments Paper • 2605.02240 • Published 12 days ago • 9
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 12 days ago • 326