SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration Paper • 2603.03823 • Published 16 days ago • 6
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26, 2025 • 95
SteerLM Collection A collection of models and datasets relating to SteerLM and HelpSteer. • 7 items • Updated 3 days ago • 15
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 59
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval Paper • 2401.18059 • Published Jan 31, 2024 • 48