How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs Paper • 2501.10711 • Published Jan 18, 2025 • 1
What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond Paper • 2503.20589 • Published Mar 26, 2025 • 1
EffiReasonTrans: RL-Optimized Reasoning for Code Translation Paper • 2510.18863 • Published Oct 21, 2025 • 1
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey Paper • 2601.11655 • Published Jan 15 • 61
No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation Paper • 2305.04207 • Published May 7, 2023 • 1
Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation Paper • 2308.01240 • Published Aug 2, 2023 • 2
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? Paper • 2410.22240 • Published Oct 29, 2024 • 1
A Hierarchical and Evolvable Benchmark for Fine-Grained Code Instruction Following with Multi-Turn Feedback Paper • 2507.00699 • Published Jul 1, 2025 • 1
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation Paper • 2308.01861 • Published Aug 3, 2023 • 2
Generating High-Quality Datasets for Code Editing via Open-Source Language Models Paper • 2509.25203 • Published Sep 19, 2025 • 1