ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 3 days ago • 98
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning Paper • 2603.03379 • Published Mar 3 • 32
BabyVision Collection State-of-the-art MLLMs achieve PhD-level language reasoning but struggle with visual tasks that 3-year-olds solve effortlessly. • 2 items • Updated Jan 10 • 4