Approaching Human-Level Forecasting with Language Models Paper • 2402.18563 • Published Feb 28, 2024 • 3
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities Paper • 2409.19839 • Published Sep 30, 2024
SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts Paper • 2505.21828 • Published May 27, 2025
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors Paper • 2506.10949 • Published Jun 12, 2025
Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published 8 days ago • 26