xbench

community

https://xbench.org/

AI & ML interests

None defined yet.

Recent Activity

huxueyu submitted a paper 8 days ago

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

huxueyu submitted a paper 17 days ago

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

Lucky2022 updated a dataset 21 days ago

xbench/AgentIF-OneDay

View all activity

huxueyu

submitted a paper to Daily Papers 8 days ago

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

Paper • 2602.09514 • Published 10 days ago • 9

huxueyu

submitted a paper to Daily Papers 17 days ago

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

Paper • 2601.20613 • Published 23 days ago • 10

Lucky2022

updated a dataset 21 days ago

xbench/AgentIF-OneDay

Viewer • Updated 21 days ago • 58 • 331 • 3

Lucky2022

published a dataset about 1 month ago

xbench/AgentIF-OneDay

Viewer • Updated 21 days ago • 58 • 331 • 3

huxueyu

updated a dataset about 1 month ago

xbench/AgentIF-OneDay

Viewer • Updated 21 days ago • 58 • 331 • 3

huxueyu

in xbench/AgentIF-OneDay about 1 month ago

Update README.md

#8 opened about 1 month ago by

Update README.md

#7 opened about 1 month ago by

Create README.md

#6 opened about 1 month ago by

Delete README.md

#5 opened about 1 month ago by

Upload data.jsonl

#4 opened about 1 month ago by

Upload 132 files

#3 opened about 1 month ago by

Upload 132 files

#2 opened about 1 month ago by

Upload data.jsonl

#1 opened about 1 month ago by

Lucky2022

authored a paper 3 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 38

lyangpku

published a dataset 4 months ago

xbench/DeepSearch-2510

Viewer • Updated Oct 24, 2025 • 100 • 230 • 2

lyangpku

updated a dataset 4 months ago

xbench/DeepSearch-2510

Viewer • Updated Oct 24, 2025 • 100 • 230 • 2

Lucky2022

authored a paper 8 months ago

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

Paper • 2506.13651 • Published Jun 16, 2025 • 8

lyangpku

updated 2 datasets 8 months ago

xbench/ScienceQA

Viewer • Updated Jun 18, 2025 • 100 • 26 • 8

xbench/DeepSearch

Viewer • Updated Jun 18, 2025 • 100 • 236 • 12

lyangpku

published a dataset 9 months ago

xbench/DeepSearch

Viewer • Updated Jun 18, 2025 • 100 • 236 • 12