Library - a JuanRafap Collection

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 226 • 98

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 38

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 144

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Paper • 2505.24298 • Published May 30, 2025 • 28

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published May 26, 2025 • 46

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20, 2025 • 62

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22, 2025 • 58

JULI: Jailbreak Large Language Models by Self-Introspection

Paper • 2505.11790 • Published May 17, 2025 • 1

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19, 2025 • 36

Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5, 2025 • 81

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Paper • 2505.12504 • Published May 18, 2025 • 24

Neuro-Symbolic Query Compiler

Paper • 2505.11932 • Published May 17, 2025 • 18

Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Paper • 2506.01320 • Published Jun 2, 2025 • 16

Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5, 2025 • 27

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Paper • 2506.00070 • Published May 29, 2025 • 29

A Controllable Examination for Long-Context Language Models

Paper • 2506.02921 • Published Jun 3, 2025 • 33

MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs

Paper • 2506.01674 • Published Jun 2, 2025 • 28

CodeContests+: High-Quality Test Case Generation for Competitive Programming

Paper • 2506.05817 • Published Jun 6, 2025 • 9

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion

Paper • 2506.01111 • Published Jun 1, 2025 • 31

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Paper • 2506.08012 • Published Jun 9, 2025 • 7

Dreamland: Controllable World Creation with Simulator and Generative Models

Paper • 2506.08006 • Published Jun 9, 2025 • 7

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6, 2025 • 73

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9, 2025 • 20

Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9, 2025 • 20

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

Paper • 2506.09040 • Published Jun 10, 2025 • 34

Through the Valley: Path to Effective Long CoT Training for Small Language Models

Paper • 2506.07712 • Published Jun 9, 2025 • 18

Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework

Paper • 2506.02454 • Published Jun 3, 2025 • 7

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Paper • 2506.04614 • Published Jun 5, 2025 • 19

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6, 2025 • 30

Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 66

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17, 2025 • 41

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Paper • 2506.14702 • Published Jun 17, 2025 • 3

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Paper • 2506.06962 • Published Jun 8, 2025 • 28

DoTA-RAG: Dynamic of Thought Aggregation RAG

Paper • 2506.12571 • Published Jun 14, 2025 • 50

syftr: Pareto-Optimal Generative AI

Paper • 2505.20266 • Published May 26, 2025

Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15, 2025 • 63

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

Paper • 2506.10082 • Published Jun 11, 2025 • 8

General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20, 2025 • 24

Optimizing Length Compression in Large Reasoning Models

Paper • 2506.14755 • Published Jun 17, 2025 • 10

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation

Paper • 2506.17202 • Published Jun 20, 2025 • 10

ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Paper • 2506.18896 • Published Jun 23, 2025 • 29

Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89

Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19, 2025 • 9

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Paper • 2507.02813 • Published Jul 3, 2025 • 60

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2, 2025 • 18

Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective

Paper • 2506.17930 • Published Jun 22, 2025 • 19

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30, 2025 • 50

katanemo/Arch-Router-1.5B

Text Generation • Updated Nov 16, 2025 • 2.5k • • 246

Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Paper • 2507.03336 • Published Jul 4, 2025 • 7

SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8, 2025 • 115

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8, 2025 • 45

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Paper • 2507.05687 • Published Jul 8, 2025 • 30

Coding Triangle: How Does Large Language Model Understand Code?

Paper • 2507.06138 • Published Jul 8, 2025 • 22

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Paper • 2507.05920 • Published Jul 8, 2025 • 12

RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Paper • 2507.03253 • Published Jul 4, 2025 • 19

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Paper • 2507.07996 • Published Jul 10, 2025 • 36

Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

Paper • 2507.08801 • Published Jul 11, 2025 • 31

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 261

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20, 2025 • 60

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published Jul 17, 2025 • 42

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22, 2025 • 122

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20, 2025 • 47

Does More Inference-Time Compute Really Help Robustness?

Paper • 2507.15974 • Published Jul 21, 2025 • 7

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Paper • 2507.15024 • Published Jul 20, 2025 • 14

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

Paper • 2507.15454 • Published Jul 21, 2025 • 7

Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Paper • 2507.14241 • Published Jul 17, 2025 • 18

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Paper • 2507.18537 • Published Jul 24, 2025 • 18

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Paper • 2507.15597 • Published Jul 21, 2025 • 34

A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning

Paper • 2507.14295 • Published Jul 18, 2025 • 14

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Paper • 2507.15852 • Published Jul 21, 2025 • 38

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Paper • 2507.12720 • Published Jul 17, 2025 • 10

RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published Jul 16, 2025 • 37

Replacing thinking with tool usage enables reasoning in small language models

Paper • 2507.05065 • Published Jul 7, 2025 • 16

Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11, 2025 • 19

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 159

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 158

Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper • 2507.20984 • Published Jul 28, 2025 • 58

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 32

Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28, 2025 • 32

UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26, 2025 • 15

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Paper • 2507.22607 • Published Jul 30, 2025 • 47

Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1, 2025 • 63

Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Paper • 2508.02150 • Published Aug 4, 2025 • 37

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1, 2025 • 94

On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31, 2025 • 6

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper • 2507.23726 • Published Jul 31, 2025 • 115

SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

Paper • 2508.01959 • Published Aug 3, 2025 • 59

Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5, 2025 • 20

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published Aug 7, 2025 • 27

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11, 2025 • 30

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Paper • 2508.05954 • Published Aug 8, 2025 • 6

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Paper • 2508.08791 • Published Aug 12, 2025 • 16

Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5, 2025 • 59

Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12, 2025 • 40

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Paper • 2508.08401 • Published Aug 11, 2025 • 42

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Paper • 2508.09192 • Published Aug 8, 2025 • 30

Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping

Paper • 2508.12466 • Published Aug 17, 2025 • 8

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Paper • 2508.13142 • Published Aug 18, 2025 • 34

VertexRegen: Mesh Generation with Continuous Level of Detail

Paper • 2508.09062 • Published Aug 12, 2025 • 38

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Paper • 2508.10395 • Published Aug 14, 2025 • 42

STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

Paper • 2508.10893 • Published Aug 14, 2025 • 31

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Paper • 2508.11987 • Published Aug 16, 2025 • 71

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14, 2025 • 29

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14, 2025 • 45

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published Aug 13, 2025 • 15

CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Paper • 2508.02091 • Published Aug 4, 2025 • 13

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21, 2025 • 47

Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20, 2025 • 85

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Paper • 2508.14896 • Published Aug 20, 2025 • 22

PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs

Paper • 2508.17188 • Published Aug 24, 2025 • 17

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published Aug 23, 2025 • 24

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Paper • 2508.18032 • Published Aug 25, 2025 • 41

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

Paper • 2508.16745 • Published Aug 22, 2025 • 29

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36

Do What? Teaching Vision-Language-Action Models to Reject the Impossible

Paper • 2508.16292 • Published Aug 22, 2025 • 9

MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

Paper • 2508.17811 • Published Aug 25, 2025 • 7

FastMesh:Efficient Artistic Mesh Generation via Component Decoupling

Paper • 2508.19188 • Published Aug 26, 2025 • 17

Spacer: Towards Engineered Scientific Inspiration

Paper • 2508.17661 • Published Aug 25, 2025 • 32

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26, 2025 • 16

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 89

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published Aug 28, 2025 • 11

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29, 2025 • 29

Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published Aug 29, 2025 • 19

CLIPSym: Delving into Symmetry Detection with CLIP

Paper • 2508.14197 • Published Aug 19, 2025 • 8

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

Paper • 2509.02522 • Published Sep 2, 2025 • 26

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 84

Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29, 2025 • 14

LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3, 2025 • 24

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30, 2025 • 72

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published Sep 4, 2025 • 58

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4, 2025 • 76

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published Aug 28, 2025 • 16

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3, 2025 • 25

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

Paper • 2509.04011 • Published Sep 4, 2025 • 29

Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5, 2025 • 47

Bootstrapping Task Spaces for Self-Improvement

Paper • 2509.04575 • Published Sep 4, 2025 • 6

Behavioral Fingerprinting of Large Language Models

Paper • 2509.04504 • Published Sep 2, 2025 • 6

Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Paper • 2509.06493 • Published Sep 8, 2025 • 12

Reinforcement Learning Foundations for Deep Research Systems: A Survey

Paper • 2509.06733 • Published Sep 8, 2025 • 32

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 84

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Paper • 2509.06949 • Published Sep 8, 2025 • 56

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105

Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 72

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16, 2025 • 91

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Paper • 2509.10708 • Published Sep 12, 2025 • 18

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

Paper • 2509.09713 • Published Sep 8, 2025 • 25

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 116

Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16, 2025 • 34

World Modeling with Probabilistic Structure Integration

Paper • 2509.09737 • Published Sep 10, 2025 • 14

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

Paper • 2509.13755 • Published Sep 17, 2025 • 19

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Paper • 2507.16518 • Published Jul 22, 2025 • 2

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Paper • 2509.15130 • Published Sep 18, 2025 • 30

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Paper • 2509.15194 • Published Sep 18, 2025 • 33

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Paper • 2509.13761 • Published Sep 17, 2025 • 16

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Paper • 2509.15937 • Published Sep 19, 2025 • 20

BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19, 2025 • 21

MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks

Paper • 2509.14638 • Published Sep 18, 2025 • 13

Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

Paper • 2509.15233 • Published Sep 17, 2025 • 2

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19, 2025 • 58

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19, 2025 • 45

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19, 2025 • 14

Mano Report

Paper • 2509.17336 • Published Sep 22, 2025 • 10

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18, 2025 • 15

Table as Thought: Exploring Structured Thoughts in LLM Reasoning

Paper • 2501.02152 • Published Jan 4, 2025

Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning

Paper • 2412.09078 • Published Dec 12, 2024

TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection

Paper • 2412.08024 • Published Dec 11, 2024 • 1

LLM2: Let Large Language Models Harness System 2 Reasoning

Paper • 2412.20372 • Published Dec 29, 2024

Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective

Paper • 2501.11110 • Published Jan 19, 2025 • 4

Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

Paper • 2412.15797 • Published Dec 20, 2024 • 18

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Paper • 2412.12881 • Published Dec 17, 2024 • 2

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published Sep 22, 2025 • 66

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published Sep 23, 2025 • 23

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

Paper • 2509.14662 • Published Sep 18, 2025 • 13

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24, 2025 • 120

Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published Sep 28, 2025 • 118

Fine-tuning Done Right in Model Editing

Paper • 2509.22072 • Published Sep 26, 2025 • 28

No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Paper • 2509.21880 • Published Sep 26, 2025 • 53

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Paper • 2509.22414 • Published Sep 26, 2025 • 22

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26, 2025 • 30

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 135

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69

AutoIntent: AutoML for Text Classification

Paper • 2509.21138 • Published Sep 25, 2025 • 36

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 17

Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28, 2025 • 46

ReviewScore: Misinformed Peer Review Detection with Large Language Models

Paper • 2509.21679 • Published Sep 25, 2025 • 64

ReviewRL: Towards Automated Scientific Review with RL

Paper • 2508.10308 • Published Aug 14, 2025 • 1

ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Paper • 2508.15804 • Published Aug 14, 2025 • 15

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 146

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30, 2025 • 48

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published Oct 1, 2025 • 20

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 90

Interactive Training: Feedback-Driven Neural Network Optimization

Paper • 2510.02297 • Published Oct 2, 2025 • 43

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Paper • 2509.25848 • Published Sep 30, 2025 • 80

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Paper • 2510.01591 • Published Oct 2, 2025 • 28

LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published Oct 1, 2025 • 107

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Paper • 2510.00515 • Published Oct 1, 2025 • 42

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Paper • 2510.03561 • Published Oct 3, 2025 • 25

Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 79

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 54

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Paper • 2307.14936 • Published Jul 27, 2023 • 42

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Paper • 2510.05091 • Published Oct 6, 2025 • 20

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

Paper • 2510.04996 • Published Oct 6, 2025 • 16

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 119

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Paper • 2510.05069 • Published Oct 6, 2025 • 13

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Paper • 2510.03632 • Published Oct 4, 2025 • 42

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1, 2025 • 59

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 509

Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 51

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 75

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 181

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Paper • 2510.09507 • Published Oct 10, 2025 • 11

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 129

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models

Paper • 2510.08492 • Published Oct 9, 2025 • 10

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Paper • 2510.09577 • Published Oct 10, 2025 • 8

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 39

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published Oct 10, 2025 • 50

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 166

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15, 2025 • 12

Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

Paper • 2510.12586 • Published Oct 14, 2025 • 113

Understanding DeepResearch via Reports

Paper • 2510.07861 • Published Oct 9, 2025 • 7

RAG-Anything: All-in-One RAG Framework

Paper • 2510.12323 • Published Oct 14, 2025 • 67

The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 32

Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published Oct 20, 2025 • 68

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22, 2025 • 62

Unified Reinforcement and Imitation Learning for Vision-Language Models

Paper • 2510.19307 • Published Oct 22, 2025 • 32

Attention Is All You Need for KV Cache in Diffusion LLMs

Paper • 2510.14973 • Published Oct 16, 2025 • 42

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

Paper • 2510.14967 • Published Oct 16, 2025 • 34

Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19, 2025 • 8

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Paper • 2510.19779 • Published Oct 22, 2025 • 61

Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Paper • 2510.19304 • Published Oct 22, 2025 • 24

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published Oct 23, 2025 • 19

ReCode: Unify Plan and Action for Universal Granularity Control

Paper • 2510.23564 • Published Oct 27, 2025 • 122

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Paper • 2510.23473 • Published Oct 27, 2025 • 85

World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published Oct 28, 2025 • 44

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Paper • 2510.24411 • Published Oct 28, 2025 • 72

The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30, 2025 • 117

The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published Nov 6, 2025 • 17

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 129

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 229

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Paper • 2511.06307 • Published Nov 9, 2025 • 53

Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 52

DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12, 2025 • 97

Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

Paper • 2511.14210 • Published Nov 18, 2025 • 21

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Paper • 2511.15605 • Published Nov 19, 2025 • 24

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published Nov 20, 2025 • 28

TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published Nov 12, 2025 • 128

MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published Nov 10, 2025 • 13

The Path Not Taken: RLVR Provably Learns Off the Principals

Paper • 2511.08567 • Published Nov 11, 2025 • 34

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48

FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published Oct 27, 2025 • 59

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Paper • 2510.24824 • Published Oct 28, 2025 • 17

LLM-guided Hierarchical Retrieval

Paper • 2510.13217 • Published Oct 15, 2025 • 21

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16, 2025 • 17

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 56

GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published Nov 17, 2025 • 119

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 38

Flow Map Distillation Without Data

Paper • 2511.19428 • Published Nov 24, 2025 • 6

Monet: Reasoning in Latent Visual Space Beyond Images and Language

Paper • 2511.21395 • Published Nov 26, 2025 • 18

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

Paper • 2511.19773 • Published Nov 24, 2025 • 10

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

Paper • 2511.20102 • Published Nov 25, 2025 • 28

Architecture Decoupling Is Not All You Need For Unified Multimodal Model

Paper • 2511.22663 • Published Nov 27, 2025 • 29

SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

Paper • 2512.00722 • Published Nov 30, 2025 • 16

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 105

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Paper • 2512.02014 • Published Dec 1, 2025 • 73

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published Dec 2, 2025 • 33

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Paper • 2512.05150 • Published Dec 3, 2025 • 76

UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers

Paper • 2512.04504 • Published Dec 4, 2025 • 18

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Paper • 2512.04220 • Published Dec 3, 2025 • 16

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Paper • 2512.03000 • Published Dec 2, 2025 • 37

PromptBridge: Cross-Model Prompt Transfer for Large Language Models

Paper • 2512.01420 • Published Dec 1, 2025 • 11

PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published Dec 3, 2025 • 48

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

Paper • 2512.02807 • Published Dec 2, 2025 • 9

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Paper • 2512.04926 • Published Dec 4, 2025 • 42

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Paper • 2512.07461 • Published Dec 8, 2025 • 78

Distribution Matching Variational AutoEncoder

Paper • 2512.07778 • Published Dec 8, 2025 • 29

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Paper • 2512.08153 • Published Dec 9, 2025 • 8

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Paper • 2512.08829 • Published Dec 9, 2025 • 21

Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published Dec 2, 2025 • 20

Rethinking Training Dynamics in Scale-wise Autoregressive Generation

Paper • 2512.06421 • Published Dec 6, 2025 • 7

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Paper • 2512.07802 • Published Dec 8, 2025 • 46

unsloth/Devstral-2-123B-Instruct-2512-GGUF

125B • Updated Dec 15, 2025 • 12.4k • 47

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Paper • 2512.10534 • Published Dec 11, 2025 • 32

BEAVER: An Efficient Deterministic LLM Verifier

Paper • 2512.05439 • Published Dec 5, 2025 • 36

Vector Quantization using Gaussian Variational Autoencoder

Paper • 2512.06609 • Published Dec 7, 2025 • 1

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Paper • 2512.07525 • Published Dec 8, 2025 • 59

VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction

Paper • 2511.23386 • Published Nov 28, 2025 • 16

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Paper • 2512.10739 • Published Dec 11, 2025 • 47

OmniPSD: Layered PSD Generation with Diffusion Transformer

Paper • 2512.09247 • Published Dec 10, 2025 • 48

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93

KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 43

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Paper • 2512.14067 • Published Dec 16, 2025 • 16

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Paper • 2512.14052 • Published Dec 16, 2025 • 42

Universal Reasoning Model

Paper • 2512.14693 • Published Dec 16, 2025 • 43

Image Diffusion Preview with Consistency Solver

Paper • 2512.13592 • Published Dec 15, 2025 • 8

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Paper • 2512.15702 • Published Dec 17, 2025 • 16

STeCa: Step-level Trajectory Calibration for LLM Agent Learning

Paper • 2502.14276 • Published Feb 20, 2025 • 1

Step-GUI Technical Report

Paper • 2512.15431 • Published Dec 17, 2025 • 132

Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification

Paper • 2512.16921 • Published Dec 18, 2025 • 8

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Paper • 2512.16905 • Published Dec 18, 2025 • 32

DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders

Paper • 2512.13690 • Published Dec 15, 2025 • 3

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published Dec 15, 2025 • 36

Adaptation of Agentic AI

Paper • 2512.16301 • Published Dec 18, 2025 • 107

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Paper • 2512.12967 • Published Dec 15, 2025 • 108

CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

Paper • 2512.10342 • Published Dec 11, 2025 • 1

UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction

Paper • 2512.12941 • Published Dec 15, 2025 • 2

TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning

Paper • 2512.13106 • Published Dec 15, 2025 • 4

Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation

Paper • 2512.13655 • Published Dec 15, 2025 • 3

Janus: Disaggregating Attention and Experts for Scalable MoE Inference

Paper • 2512.13525 • Published Dec 15, 2025 • 6

RePo: Language Models with Context Re-Positioning

Paper • 2512.14391 • Published Dec 16, 2025 • 12

VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Paper • 2512.14531 • Published Dec 16, 2025 • 15

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Paper • 2512.13303 • Published Dec 15, 2025 • 17

Differentiable Evolutionary Reinforcement Learning

Paper • 2512.13399 • Published Dec 15, 2025 • 22

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

Paper • 2512.14699 • Published Dec 16, 2025 • 28

RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Paper • 2512.13660 • Published Dec 15, 2025 • 37

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119

Hybrid Attribution Priors for Explainable and Robust Model Training

Paper • 2512.14719 • Published Dec 9, 2025 • 3

WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory

Paper • 2512.13190 • Published Dec 15, 2025 • 8

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 66

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published Dec 18, 2025 • 36

google/gemma-scope-2

Updated Dec 19, 2025 • 71

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published Dec 22, 2025 • 64

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Paper • 2512.19134 • Published Dec 22, 2025 • 32

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 219

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published Dec 18, 2025 • 119

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Paper • 2512.20618 • Published Dec 23, 2025 • 55

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Paper • 2512.17206 • Published Dec 19, 2025 • 20

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Paper • 2512.17008 • Published Dec 18, 2025 • 11

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

Paper • 2510.15859 • Published Oct 17, 2025 • 13

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published Dec 16, 2025 • 42

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Paper • 2512.17351 • Published Dec 19, 2025 • 28

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Paper • 2512.15687 • Published Dec 17, 2025 • 21

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Paper • 2512.13874 • Published Dec 15, 2025 • 17

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Paper • 2512.14080 • Published Dec 16, 2025 • 9

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Paper • 2512.14202 • Published Dec 16, 2025 • 6

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Paper • 2512.15374 • Published Dec 17, 2025 • 6

VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs

Paper • 2512.12072 • Published Dec 12, 2025 • 17

DEER: Draft with Diffusion, Verify with Autoregressive Models

Paper • 2512.15176 • Published Dec 17, 2025 • 45

TabReX : Tabular Referenceless eXplainable Evaluation

Paper • 2512.15907 • Published Dec 17, 2025 • 2

Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Paper • 2512.16615 • Published Dec 18, 2025 • 5

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Paper • 2512.16918 • Published Dec 18, 2025 • 14

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Paper • 2512.16636 • Published Dec 18, 2025 • 26

Kling-Omni Technical Report

Paper • 2512.16776 • Published Dec 18, 2025 • 170

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published Dec 23, 2025 • 62

Multi-hop Reasoning via Early Knowledge Alignment

Paper • 2512.20144 • Published Dec 23, 2025 • 7

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16

TimeBill: Time-Budgeted Inference for Large Language Models

Paper • 2512.21859 • Published Dec 26, 2025 • 25

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Paper • 2512.22615 • Published Dec 27, 2025 • 49

Training AI Co-Scientists Using Rubric Rewards

Paper • 2512.23707 • Published Dec 29, 2025 • 21

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Paper • 2512.22238 • Published Dec 23, 2025 • 30

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published Dec 24, 2025 • 4

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 98

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 312

Evaluating Parameter Efficient Methods for RLVR

Paper • 2512.23165 • Published Dec 29, 2025 • 28

SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

Paper • 2511.16108 • Published Nov 20, 2025

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Paper • 2601.03425 • Published Jan 6 • 16

MMFormalizer: Multimodal Autoformalization in the Wild

Paper • 2601.03017 • Published Jan 6 • 105

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Paper • 2601.03559 • Published Jan 7 • 14

Token-Level LLM Collaboration via FusionRoute

Paper • 2601.05106 • Published Jan 8 • 40

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Paper • 2601.05175 • Published Jan 8 • 36

ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking

Paper • 2601.06487 • Published Jan 10 • 53

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Paper • 2601.04890 • Published Jan 8 • 42

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 228

RelayLLM: Efficient Reasoning via Collaborative Decoding

Paper • 2601.05167 • Published Jan 8 • 31

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published Jan 15 • 12

Language of Thought Shapes Output Diversity in Large Language Models

Paper • 2601.11227 • Published Jan 16 • 9

NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Paper • 2601.15165 • Published Jan 21 • 72

Learning to Discover at Test Time

Paper • 2601.16175 • Published Jan 22 • 42

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Paper • 2601.23184 • Published 29 days ago • 36

AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders

Paper • 2602.05027 • Published 24 days ago • 60

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

Paper • 2602.11748 • Published 17 days ago • 30

Voxtral Realtime

Paper • 2602.11298 • Published 17 days ago • 16

DFlash: Block Diffusion for Flash Speculative Decoding

Paper • 2602.06036 • Published 23 days ago • 42

InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

Paper • 2602.06035 • Published 23 days ago • 23

Experiential Reinforcement Learning

Paper • 2602.13949 • Published 14 days ago • 68

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Paper • 2602.14234 • Published 13 days ago • 26

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Paper • 2602.14080 • Published 14 days ago • 20

On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

Paper • 2602.15322 • Published 12 days ago • 9

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

Paper • 2305.12001 • Published May 19, 2023 • 1

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Paper • 2601.22904 • Published 29 days ago • 15

EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

Paper • 2602.18071 • Published 9 days ago • 22

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Paper • 2601.02256 • Published Jan 5 • 33

GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published Dec 30, 2025 • 29

Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

Paper • 2601.02346 • Published Jan 5 • 26

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Paper • 2512.20578 • Published Dec 23, 2025 • 85

Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 90

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Paper • 2601.01836 • Published Jan 5 • 10

Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion

Paper • 2512.23035 • Published Dec 28, 2025 • 5

Confidence Estimation for LLMs in Multi-turn Interactions

Paper • 2601.02179 • Published Jan 5 • 17

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Paper • 2601.01426 • Published Jan 4 • 24

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Paper • 2601.01576 • Published Jan 4 • 18

Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents

Paper • 2601.02314 • Published Jan 5 • 2

M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

Paper • 2512.22877 • Published Dec 28, 2025 • 2

Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published Dec 31, 2025 • 44

Deep Delta Learning

Paper • 2601.00417 • Published Jan 1 • 34

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Paper • 2601.00747 • Published Jan 2 • 20

InfoSynth: Information-Guided Benchmark Synthesis for LLMs

Paper • 2601.00575 • Published Jan 2 • 3

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 65

A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers

Paper • 2512.23380 • Published Dec 29, 2025 • 44

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

Paper • 2512.24385 • Published Dec 30, 2025 • 8

Scaling Open-Ended Reasoning to Predict the Future

Paper • 2512.25070 • Published Dec 31, 2025 • 19

Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 18

Detecting Anomalies in Machine Learning Infrastructure via Hardware Telemetry

Paper • 2510.26008 • Published Oct 29, 2025

CodeLSI: Leveraging Foundation Models for Automated Code Generation with Low-Rank Optimization and Domain-Specific Instruction Tuning

Paper • 2509.14373 • Published Sep 17, 2025

Big data analysis and distributed deep learning for next-generation intrusion detection system optimization

Paper • 2209.13961 • Published Sep 28, 2022

Qwen/DeepPlanning

Viewer • Updated 1 day ago • 2.14k • 1.17k • 187

ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

Paper • 2602.20093 • Published 5 days ago • 28

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

Paper • 2602.20160 • Published 5 days ago • 8