MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 8 days ago • 21
reasoningMIA/QwQ_Benchmark_Distill_sharegpt_multi_domains_r1_style_member Viewer • Updated Nov 15, 2025 • 1.33k • 11
reasoningMIA/QwQ_Benchmark_Distill_sharegpt_multi_domains_r1_style_member Viewer • Updated Nov 15, 2025 • 1.33k • 11
reasoningMIA/OpenThoughts3-10K_member_conta3_member_multi_domains Viewer • Updated Nov 15, 2025 • 14k • 6
reasoningMIA/OpenThoughts3-10K_member_conta3_member_multi_domains Viewer • Updated Nov 15, 2025 • 14k • 6
reasoningMIA/QwQ_Benchmark_Distill_sharegpt_multi_domains_r1_style Viewer • Updated Nov 15, 2025 • 2.66k • 9
reasoningMIA/QwQ_Benchmark_Distill_sharegpt_multi_domains_r1_style Viewer • Updated Nov 15, 2025 • 2.66k • 9