FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published 11 days ago • 93
PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning Paper • 2603.03331 • Published Feb 10 • 2
MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks Paper • 2602.16313 • Published Feb 18 • 3
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Paper • 2604.11778 • Published 6 days ago • 8
bitext/Bitext-customer-support-llm-chatbot-training-dataset Viewer • Updated Jul 18, 2024 • 26.9k • 3.1k • 160