TFPI Collection ICLR2026: Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners https://arxiv.org/abs/2509.26226 • 14 items • Updated Feb 12 • 1
Composition-RL Collection Datasets and trained checkpoints of Composition-RL • 13 items • Updated about 18 hours ago • 1
HARE: HumAn pRiors, a key to small language model Efficiency Paper • 2406.11410 • Published Jun 17, 2024 • 40
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published 4 days ago • 117
LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval Paper • 2603.01425 • Published 16 days ago • 6
AgentStepper: Interactive Debugging of Software Development Agents Paper • 2602.06593 • Published Feb 6 • 1
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization Paper • 2511.19661 • Published Nov 24, 2025 • 3
PFPO Collection Resources for the paper Preference Optimization for Reasoning with Pseudo Feedback (ICLR 2025) • 4 items • Updated Feb 6, 2025 • 2
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs Paper • 2602.00462 • Published Jan 31 • 19
Humans and LLMs Diverge on Probabilistic Inferences Paper • 2602.23546 • Published 19 days ago • 13
LLM2Vec-Gen: Generative Embeddings from Large Language Models Paper • 2603.10913 • Published 6 days ago • 38
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression Paper • 2510.13999 • Published Oct 15, 2025 • 15
MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning Paper • 2601.07107 • Published Jan 12 • 1
LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues Paper • 2507.13681 • Published Jul 18, 2025 • 1
Guided Decoding and Its Critical Role in Retrieval-Augmented Generation Paper • 2509.06631 • Published Sep 8, 2025 • 12
PANDA (Pedantic ANswer-correctness Determination and Adjudication):Improving Automatic Evaluation for Question Answering and Text Generation Paper • 2402.11161 • Published Feb 17, 2024 • 2
Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs Paper • 2602.07276 • Published Feb 7 • 11
Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection Paper • 2510.09695 • Published Oct 9, 2025 • 1