BigCode

non-profit

https://www.bigcode-project.org/

bigcode-project

AI & ML interests

None defined yet.

Recent Activity

Elfsong authored a paper 18 days ago

Mastermind: Strategy-grounded Learning for Repository-Scale Vulnerability Reproduction

Elfsong submitted a paper 18 days ago

Mastermind: Strategy-grounded Learning for Repository-Scale Vulnerability Reproduction

cs-mshah authored a paper 30 days ago

MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

View all activity

Papers

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

View all Papers

Articles

BigCodeArena: Judging code generations end to end with code executions

authored 2 papers 30 days ago

MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

Paper • 2504.15397 • Published Apr 21, 2025

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

Paper • 2606.23688 • Published Jun 22 • 5

authored 8 papers about 1 month ago

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 8

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Paper • 2206.11249 • Published Jun 22, 2022

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Paper • 2206.15076 • Published Jun 30, 2022 • 5

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 39

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Paper • 2411.16508 • Published Nov 25, 2024 • 10

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Paper • 2411.19799 • Published Nov 29, 2024 • 16

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Paper • 2511.05613 • Published Nov 6, 2025

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

Paper • 2606.09809 • Published Jun 8 • 4

lckr

authored a paper about 2 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 157

authored a paper about 2 months ago

Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

Paper • 2606.02255 • Published Jun 1

submitted a paper to Daily Papers about 2 months ago

Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

Paper • 2606.02255 • Published Jun 1

authored 6 papers about 2 months ago

OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?

Paper • 2507.19132 • Published Jul 25, 2025

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28, 2025 • 32

OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents

Paper • 2510.24563 • Published Oct 28, 2025 • 23

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 164

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Paper • 2602.02488 • Published Feb 2 • 36

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Paper • 2605.25624 • Published May 25 • 35

submitted a paper to Daily Papers 3 months ago

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Paper • 2604.20398 • Published Apr 22 • 3