SWE-bench Collection SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues. โข 4 items โข Updated Mar 8, 2025 โข 9