RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation Paper • 2601.08430 • Published 10 days ago • 53
EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge Paper • 2601.09142 • Published 10 days ago • 9
Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published 11 days ago • 110
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published 12 days ago • 206
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 18 days ago • 101
Region-Constraint In-Context Generation for Instructional Video Editing Paper • 2512.17650 • Published Dec 19, 2025 • 51
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper • 2512.13168 • Published Dec 15, 2025 • 50
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment Paper • 2511.20614 • Published Nov 25, 2025 • 38
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Paper • 2511.19320 • Published Nov 24, 2025 • 42
UniREditBench: A Unified Reasoning-based Image Editing Benchmark Paper • 2511.01295 • Published Nov 3, 2025 • 39
TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model Paper • 2510.16449 • Published Oct 18, 2025 • 35
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 107
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 126
Table-R1: Inference-Time Scaling for Table Reasoning Paper • 2505.23621 • Published May 29, 2025 • 93
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models Paper • 2504.15716 • Published Apr 22, 2025 • 12