Library
updated
Image-Text-to-Text
• 0.2B • Updated
• 226
• 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
• 2503.09516
• Published
• 38
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper
• 2505.24863
• Published
• 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published
• 88
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
• 2505.24864
• Published
• 144
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for
Language Reasoning
Paper
• 2505.24298
• Published
• 28
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient
Fine-Tuning
Paper
• 2505.20355
• Published
• 36
Interleaved Reasoning for Large Language Models via Reinforcement
Learning
Paper
• 2505.19640
• Published
• 15
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering
Workflow
Paper
• 2505.17399
• Published
• 14
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
• 2505.19914
• Published
• 46
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
• 2505.18129
• Published
• 62
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
• 2505.14810
• Published
• 62
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
• 2505.16410
• Published
• 58
JULI: Jailbreak Large Language Models by Self-Introspection
Paper
• 2505.11790
• Published
• 1
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Paper
• 2505.13438
• Published
• 36
Paper
• 2505.14674
• Published
• 37
RM-R1: Reward Modeling as Reasoning
Paper
• 2505.02387
• Published
• 81
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
• 2505.12504
• Published
• 24
Neuro-Symbolic Query Compiler
Paper
• 2505.11932
• Published
• 18
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time
Reward Alignment in Score Models
Paper
• 2506.01320
• Published
• 16
Aligning Latent Spaces with Flow Priors
Paper
• 2506.05240
• Published
• 27
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in
Robotics
Paper
• 2506.00070
• Published
• 29
A Controllable Examination for Long-Context Language Models
Paper
• 2506.02921
• Published
• 33
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal
LLMs
Paper
• 2506.01674
• Published
• 28
CodeContests+: High-Quality Test Case Generation for Competitive
Programming
Paper
• 2506.05817
• Published
• 9
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal
Contextual Fusion
Paper
• 2506.01111
• Published
• 31
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection
Behavior
Paper
• 2506.08012
• Published
• 7
Dreamland: Controllable World Creation with Simulator and Generative
Models
Paper
• 2506.08006
• Published
• 7
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety
Assurance
Paper
• 2506.06444
• Published
• 73
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper
• 2506.07530
• Published
• 20
Solving Inequality Proofs with Large Language Models
Paper
• 2506.07927
• Published
• 20
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand
Better
Paper
• 2506.09040
• Published
• 34
Through the Valley: Path to Effective Long CoT Training for Small
Language Models
Paper
• 2506.07712
• Published
• 18
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
From Scratch with Agentic Framework
Paper
• 2506.02454
• Published
• 7
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error
Diagnosis in GUI Automation
Paper
• 2506.04614
• Published
• 19
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal
Learning
Paper
• 2506.06205
• Published
• 30
Paper
• 2506.10910
• Published
• 66
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just
Like an Olympiad Team
Paper
• 2506.14234
• Published
• 41
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time
Markers
Paper
• 2506.14702
• Published
• 3
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
• 2506.06962
• Published
• 28
DoTA-RAG: Dynamic of Thought Aggregation RAG
Paper
• 2506.12571
• Published
• 50
syftr: Pareto-Optimal Generative AI
Paper
• 2505.20266
• Published
Scaling Test-time Compute for LLM Agents
Paper
• 2506.12928
• Published
• 63
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware
LoRA Fine-Tuning
Paper
• 2506.10082
• Published
• 8
General-Reasoner: Advancing LLM Reasoning Across All Domains
Paper
• 2505.14652
• Published
• 24
Optimizing Length Compression in Large Reasoning Models
Paper
• 2506.14755
• Published
• 10
UniFork: Exploring Modality Alignment for Unified Multimodal
Understanding and Generation
Paper
• 2506.17202
• Published
• 10
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published
• 29
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
• 2506.16035
• Published
• 89
Robust Reward Modeling via Causal Rubrics
Paper
• 2506.16507
• Published
• 9
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with
TriMap Video Diffusion
Paper
• 2507.02813
• Published
• 60
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper
• 2507.01953
• Published
• 18
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
Paper
• 2506.17930
• Published
• 19
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
• 2506.24119
• Published
• 50
katanemo/Arch-Router-1.5B
Text Generation
• Updated
• 2.5k
• • 246
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs
More Realistic and Less Risky
Paper
• 2507.03336
• Published
• 7
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper
• 2507.05566
• Published
• 115
CriticLean: Critic-Guided Reinforcement Learning for Mathematical
Formalization
Paper
• 2507.06181
• Published
• 45
AutoTriton: Automatic Triton Programming with Reinforcement Learning in
LLMs
Paper
• 2507.05687
• Published
• 30
Coding Triangle: How Does Large Language Model Understand Code?
Paper
• 2507.06138
• Published
• 22
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based
Reinforcement Learning
Paper
• 2507.05920
• Published
• 12
RefineX: Learning to Refine Pre-training Data at Scale from
Expert-Guided Programs
Paper
• 2507.03253
• Published
• 19
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Paper
• 2507.07996
• Published
• 36
Lumos-1: On Autoregressive Video Generation from a Unified Model
Perspective
Paper
• 2507.08801
• Published
• 31
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
• 2507.15061
• Published
• 60
AnyCap Project: A Unified Framework, Dataset, and Benchmark for
Controllable Omni-modal Captioning
Paper
• 2507.12841
• Published
• 42
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published
• 122
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
• 2507.14958
• Published
• 47
Does More Inference-Time Compute Really Help Robustness?
Paper
• 2507.15974
• Published
• 7
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement
Feedback
Paper
• 2507.15024
• Published
• 14
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via
Gaussian Splatting
Paper
• 2507.15454
• Published
• 7
Promptomatix: An Automatic Prompt Optimization Framework for Large
Language Models
Paper
• 2507.14241
• Published
• 18
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive
Generation
Paper
• 2507.18537
• Published
• 18
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human
Videos
Paper
• 2507.15597
• Published
• 34
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Paper
• 2507.14295
• Published
• 14
SeC: Advancing Complex Video Object Segmentation via Progressive Concept
Construction
Paper
• 2507.15852
• Published
• 38
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
• 2507.12720
• Published
• 10
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA
Optimization
Paper
• 2507.12142
• Published
• 37
Replacing thinking with tool usage enables reasoning in small language
models
Paper
• 2507.05065
• Published
• 16
Lizard: An Efficient Linearization Framework for Large Language Models
Paper
• 2507.09025
• Published
• 19
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published
• 159
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published
• 158
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published
• 68
SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment
Paper
• 2507.20984
• Published
• 58
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
• 2507.19478
• Published
• 32
Geometric-Mean Policy Optimization
Paper
• 2507.20673
• Published
• 32
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing
Large Language Models' Reasoning Abilities
Paper
• 2507.19766
• Published
• 15
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Paper
• 2507.22607
• Published
• 47
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language
Models
Paper
• 2508.00819
• Published
• 63
Beyond the Trade-off: Self-Supervised Reinforcement Learning for
Reasoning Models' Instruction Following
Paper
• 2508.02150
• Published
• 37
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent
Foundation Models Training
Paper
• 2508.00414
• Published
• 94
On the Expressiveness of Softmax Attention: A Recurrent Neural Network
Perspective
Paper
• 2507.23632
• Published
• 6
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Paper
• 2507.23726
• Published
• 115
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic
Association and Long Story Comprehension
Paper
• 2508.01959
• Published
• 59
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper
• 2508.03012
• Published
• 20
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy
Optimization
Paper
• 2508.05731
• Published
• 27
MeshLLM: Empowering Large Language Models to Progressively Understand
and Generate 3D Mesh
Paper
• 2508.01242
• Published
• 11
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published
• 50
Reinforcement Learning in Vision: A Survey
Paper
• 2508.08189
• Published
• 30
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with
Patch-level CLIP Latents
Paper
• 2508.05954
• Published
• 6
Feedback-Driven Tool-Use Improvements in Large Language Models via
Automated Build Environments
Paper
• 2508.08791
• Published
• 16
Training Long-Context, Multi-Turn Software Engineering Agents with
Reinforcement Learning
Paper
• 2508.03501
• Published
• 59
Complex Logical Instruction Generation
Paper
• 2508.09125
• Published
• 40
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Paper
• 2508.08401
• Published
• 42
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion
Forcing
Paper
• 2508.09192
• Published
• 30
Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision
Mapping
Paper
• 2508.12466
• Published
• 8
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Paper
• 2508.13142
• Published
• 34
VertexRegen: Mesh Generation with Continuous Level of Detail
Paper
• 2508.09062
• Published
• 38
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache
Rematerialization
Paper
• 2508.10395
• Published
• 42
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
Paper
• 2508.10893
• Published
• 31
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper
• 2508.11987
• Published
• 71
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper
• 2508.13186
• Published
• 19
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
• 2508.10751
• Published
• 29
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Paper
• 2508.10833
• Published
• 45
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper
• 2508.09968
• Published
• 15
CRINN: Contrastive Reinforcement Learning for Approximate Nearest
Neighbor Search
Paper
• 2508.02091
• Published
• 13
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on
Challenging Queries
Paper
• 2508.15760
• Published
• 47
Deep Think with Confidence
Paper
• 2508.15260
• Published
• 90
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference
Optimization
Paper
• 2508.14460
• Published
• 85
Quantization Meets dLLMs: A Systematic Study of Post-training
Quantization for Diffusion LLMs
Paper
• 2508.14896
• Published
• 22
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent
LLMs
Paper
• 2508.17188
• Published
• 17
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
• 2508.16949
• Published
• 24
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance
for Text-to-Image Generation
Paper
• 2508.18032
• Published
• 41
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory
and Test-Time Compute Scaling
Paper
• 2508.16745
• Published
• 29
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
Long-Context Learning
Paper
• 2508.18756
• Published
• 36
Do What? Teaching Vision-Language-Action Models to Reject the Impossible
Paper
• 2508.16292
• Published
• 9
MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian
Splatting
Paper
• 2508.17811
• Published
• 7
FastMesh:Efficient Artistic Mesh Generation via Component Decoupling
Paper
• 2508.19188
• Published
• 17
Spacer: Towards Engineered Scientific Inspiration
Paper
• 2508.17661
• Published
• 32
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
• 2508.18773
• Published
• 16
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
• 2508.19247
• Published
• 43
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
• 2508.17445
• Published
• 80
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published
• 89
Provable Benefits of In-Tool Learning for Large Language Models
Paper
• 2508.20755
• Published
• 11
Think in Games: Learning to Reason in Games via Reinforcement Learning
with Large Language Models
Paper
• 2508.21365
• Published
• 29
Efficient Code Embeddings from Code Generation Models
Paper
• 2508.21290
• Published
• 19
CLIPSym: Delving into Symmetry Detection with CLIP
Paper
• 2508.14197
• Published
• 8
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
• 2509.02522
• Published
• 26
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
• 2509.02479
• Published
• 84
Universal Deep Research: Bring Your Own Model and Strategy
Paper
• 2509.00244
• Published
• 14
LMEnt: A Suite for Analyzing Knowledge in Language Models from
Pretraining Data to Representations
Paper
• 2509.03405
• Published
• 24
Open Data Synthesis For Deep Research
Paper
• 2509.00375
• Published
• 72
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
Real Instructions?
Paper
• 2509.04292
• Published
• 58
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published
• 76
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex
Dynamic Environment? A Study on τ-bench
Paper
• 2508.20931
• Published
• 16
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper
• 2509.03059
• Published
• 25
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware
Embeddings
Paper
• 2509.04011
• Published
• 29
Symbolic Graphics Programming with Large Language Models
Paper
• 2509.05208
• Published
• 47
Bootstrapping Task Spaces for Self-Improvement
Paper
• 2509.04575
• Published
• 6
Behavioral Fingerprinting of Large Language Models
Paper
• 2509.04504
• Published
• 6
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM
Step-Provers
Paper
• 2509.06493
• Published
• 12
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
• 2509.06733
• Published
• 32
Reconstruction Alignment Improves Unified Multimodal Models
Paper
• 2509.07295
• Published
• 40
Visual Representation Alignment for Multimodal Large Language Models
Paper
• 2509.07979
• Published
• 84
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
• 2509.06949
• Published
• 56
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published
• 105
Towards General Agentic Intelligence via Environment Scaling
Paper
• 2509.13311
• Published
• 72
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic
Data and Scalable Reinforcement Learning
Paper
• 2509.13305
• Published
• 91
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based
Instruction Dataset Creation
Paper
• 2509.10708
• Published
• 18
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented
Generation for Multi-hop Question Answering
Paper
• 2509.09713
• Published
• 25
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
• 2509.15207
• Published
• 116
Single-stream Policy Optimization
Paper
• 2509.13232
• Published
• 34
World Modeling with Probabilistic Structure Integration
Paper
• 2509.09737
• Published
• 14
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via
Machine Unlearning
Paper
• 2509.13755
• Published
• 19
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving
Reasoning
Paper
• 2507.16518
• Published
• 2
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model
via Training-Free Guidance
Paper
• 2509.15130
• Published
• 30
Evolving Language Models without Labels: Majority Drives Selection,
Novelty Promotes Variation
Paper
• 2509.15194
• Published
• 33
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical
Reasoning
Paper
• 2509.13761
• Published
• 16
A Vision-Language-Action-Critic Model for Robotic Real-World
Reinforcement Learning
Paper
• 2509.15937
• Published
• 20
BaseReward: A Strong Baseline for Multimodal Reward Model
Paper
• 2509.16127
• Published
• 21
MultiEdit: Advancing Instruction-based Image Editing on Diverse and
Challenging Tasks
Paper
• 2509.14638
• Published
• 13
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided
Role-playing Agents
Paper
• 2509.15233
• Published
• 2
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid
Vision Tokenizer
Paper
• 2509.16197
• Published
• 58
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
• 2509.15591
• Published
• 45
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper
• 2509.15566
• Published
• 14
Paper
• 2509.17336
• Published
• 10
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published
• 15
Table as Thought: Exploring Structured Thoughts in LLM Reasoning
Paper
• 2501.02152
• Published
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
• 2412.09078
• Published
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
Internalization with Self-Reflection
Paper
• 2412.08024
• Published
• 1
LLM2: Let Large Language Models Harness System 2 Reasoning
Paper
• 2412.20372
• Published
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
• 2501.11110
• Published
• 4
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
• 2412.15797
• Published
• 18
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
Verification and Refinement
Paper
• 2412.12881
• Published
• 2
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion
Transformer Models
Paper
• 2509.17627
• Published
• 66
Hyper-Bagel: A Unified Acceleration Framework for Multimodal
Understanding and Generation
Paper
• 2509.18824
• Published
• 23
Understanding the Thinking Process of Reasoning Models: A Perspective
from Schoenfeld's Episode Theory
Paper
• 2509.14662
• Published
• 13
VCRL: Variance-based Curriculum Reinforcement Learning for Large
Language Models
Paper
• 2509.19803
• Published
• 120
Tree Search for LLM Agent Reinforcement Learning
Paper
• 2509.21240
• Published
• 92
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable
Sparse-Linear Attention
Paper
• 2509.24006
• Published
• 118
Fine-tuning Done Right in Model Editing
Paper
• 2509.22072
• Published
• 28
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper
• 2509.21880
• Published
• 53
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale
Diffusion Transformer
Paper
• 2509.22414
• Published
• 22
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
Exploration for Agentic Reinforcement Learning
Paper
• 2509.22601
• Published
• 30
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
• 2509.22576
• Published
• 135
Variational Reasoning for Language Models
Paper
• 2509.22637
• Published
• 69
AutoIntent: AutoML for Text Classification
Paper
• 2509.21138
• Published
• 36
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper
• 2509.25760
• Published
• 55
Attention as a Compass: Efficient Exploration for Process-Supervised RL
in Reasoning Models
Paper
• 2509.26628
• Published
• 17
Sequential Diffusion Language Models
Paper
• 2509.24007
• Published
• 46
ReviewScore: Misinformed Peer Review Detection with Large Language
Models
Paper
• 2509.21679
• Published
• 64
ReviewRL: Towards Automated Scientific Review with RL
Paper
• 2508.10308
• Published
• 1
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper
• 2508.15804
• Published
• 15
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published
• 146
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
• 2509.25849
• Published
• 48
BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper
• 2510.01180
• Published
• 20
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published
• 90
Interactive Training: Feedback-Driven Neural Network Optimization
Paper
• 2510.02297
• Published
• 43
More Thought, Less Accuracy? On the Dual Nature of Reasoning in
Vision-Language Models
Paper
• 2509.25848
• Published
• 80
CLUE: Non-parametric Verification from Experience via Hidden-State
Clustering
Paper
• 2510.01591
• Published
• 28
LongCodeZip: Compress Long Context for Code Language Models
Paper
• 2510.00446
• Published
• 107
Efficient Multi-modal Large Language Models via Progressive Consistency
Distillation
Paper
• 2510.00515
• Published
• 42
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
• 2510.03561
• Published
• 25
Large Language Models as Optimizers
Paper
• 2309.03409
• Published
• 79
Connecting Large Language Models with Evolutionary Algorithms Yields
Powerful Prompt Optimizers
Paper
• 2309.08532
• Published
• 54
PanGu-Coder2: Boosting Large Language Models for Code with Ranking
Feedback
Paper
• 2307.14936
• Published
• 42
Factuality Matters: When Image Generation and Editing Meet Structured
Visuals
Paper
• 2510.05091
• Published
• 20
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM
Training
Paper
• 2510.04996
• Published
• 16
Paper2Video: Automatic Video Generation from Scientific Papers
Paper
• 2510.05096
• Published
• 119
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior
Reasoning LLMs
Paper
• 2510.05069
• Published
• 13
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
Information
Paper
• 2510.03632
• Published
• 42
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published
• 59
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published
• 31
Agent Learning via Early Experience
Paper
• 2510.08558
• Published
• 273
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
Multimodal Models
Paper
• 2510.05034
• Published
• 51
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published
• 75
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published
• 181
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Paper
• 2510.09507
• Published
• 11
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
• 2510.04618
• Published
• 129
Better Together: Leveraging Unpaired Multimodal Data for Stronger
Unimodal Models
Paper
• 2510.08492
• Published
• 10
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published
• 8
BigCodeArena: Unveiling More Reliable Human Preferences in Code
Generation via Execution
Paper
• 2510.08697
• Published
• 39
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
MLLMs
Paper
• 2510.09201
• Published
• 50
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published
• 166
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper
• 2510.13515
• Published
• 12
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised
Pre-training
Paper
• 2510.12586
• Published
• 113
Understanding DeepResearch via Reports
Paper
• 2510.07861
• Published
• 7
RAG-Anything: All-in-One RAG Framework
Paper
• 2510.12323
• Published
• 67
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published
• 32
Glyph: Scaling Context Windows via Visual-Text Compression
Paper
• 2510.17800
• Published
• 68
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper
• 2510.19363
• Published
• 62
Unified Reinforcement and Imitation Learning for Vision-Language Models
Paper
• 2510.19307
• Published
• 32
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper
• 2510.14973
• Published
• 42
Information Gain-based Policy Optimization: A Simple and Effective
Approach for Multi-Turn LLM Agents
Paper
• 2510.14967
• Published
• 34
Video Reasoning without Training
Paper
• 2510.17045
• Published
• 8
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative
Decoders
Paper
• 2510.19779
• Published
• 61
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
Paper
• 2510.19304
• Published
• 24
Every Question Has Its Own Value: Reinforcement Learning with Explicit
Human Values
Paper
• 2510.20187
• Published
• 19
ReCode: Unify Plan and Action for Universal Granularity Control
Paper
• 2510.23564
• Published
• 122
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper
• 2510.14901
• Published
• 48
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement
Learning
Paper
• 2510.23473
• Published
• 85
World Simulation with Video Foundation Models for Physical AI
Paper
• 2511.00062
• Published
• 44
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid
Validation in Realistic Workflows
Paper
• 2510.24411
• Published
• 72
The End of Manual Decoding: Towards Truly End-to-End Language Models
Paper
• 2510.26697
• Published
• 117
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
Paper
• 2511.04217
• Published
• 17
Diffusion Language Models are Super Data Learners
Paper
• 2511.03276
• Published
• 129
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published
• 229
DRIVE: Data Curation Best Practices for Reinforcement Learning with
Verifiable Reward in Competitive Code Generation
Paper
• 2511.06307
• Published
• 53
Black-Box On-Policy Distillation of Large Language Models
Paper
• 2511.10643
• Published
• 52
DoPE: Denoising Rotary Position Embedding
Paper
• 2511.09146
• Published
• 97
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
Paper
• 2511.14210
• Published
• 21
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models
Paper
• 2511.15605
• Published
• 24
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Paper
• 2511.16664
• Published
• 28
TiDAR: Think in Diffusion, Talk in Autoregression
Paper
• 2511.08923
• Published
• 128
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Paper
• 2511.06805
• Published
• 13
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper
• 2511.08567
• Published
• 34
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published
• 48
FARMER: Flow AutoRegressive Transformer over Pixels
Paper
• 2510.23588
• Published
• 59
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Paper
• 2510.24824
• Published
• 17
LLM-guided Hierarchical Retrieval
Paper
• 2510.13217
• Published
• 21
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per
Token via Reinforcement Learning
Paper
• 2510.15110
• Published
• 17
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal
Evidence
Paper
• 2510.20579
• Published
• 56
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Paper
• 2511.17592
• Published
• 119
Paper
• 2511.11238
• Published
• 38
Flow Map Distillation Without Data
Paper
• 2511.19428
• Published
• 6
Monet: Reasoning in Latent Visual Space Beyond Images and Language
Paper
• 2511.21395
• Published
• 18
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Paper
• 2511.19773
• Published
• 10
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
Paper
• 2511.20102
• Published
• 28
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
Paper
• 2511.22663
• Published
• 29
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
Paper
• 2512.00722
• Published
• 16
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published
• 105
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Paper
• 2512.02014
• Published
• 73
OneThinker: All-in-one Reasoning Model for Image and Video
Paper
• 2512.03043
• Published
• 33
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper
• 2512.05591
• Published
• 17
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
• 2512.05150
• Published
• 76
UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
Paper
• 2512.04504
• Published
• 18
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
Paper
• 2512.04220
• Published
• 16
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Paper
• 2512.03000
• Published
• 37
PromptBridge: Cross-Model Prompt Transfer for Large Language Models
Paper
• 2512.01420
• Published
• 11
PretrainZero: Reinforcement Active Pretraining
Paper
• 2512.03442
• Published
• 48
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
Paper
• 2512.02807
• Published
• 9
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Paper
• 2512.04926
• Published
• 42
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Paper
• 2512.07461
• Published
• 78
Distribution Matching Variational AutoEncoder
Paper
• 2512.07778
• Published
• 29
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Paper
• 2512.08153
• Published
• 8
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
Paper
• 2512.08829
• Published
• 21
Self-Improving VLM Judges Without Human Annotations
Paper
• 2512.05145
• Published
• 20
Rethinking Training Dynamics in Scale-wise Autoregressive Generation
Paper
• 2512.06421
• Published
• 7
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Paper
• 2512.07802
• Published
• 46
unsloth/Devstral-2-123B-Instruct-2512-GGUF
125B • Updated
• 12.4k
• 47
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
Paper
• 2512.10534
• Published
• 32
BEAVER: An Efficient Deterministic LLM Verifier
Paper
• 2512.05439
• Published
• 36
Vector Quantization using Gaussian Variational Autoencoder
Paper
• 2512.06609
• Published
• 1
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
Paper
• 2512.07525
• Published
• 59
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Paper
• 2511.23386
• Published
• 16
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving
Paper
• 2512.10739
• Published
• 47
OmniPSD: Layered PSD Generation with Diffusion Transformer
Paper
• 2512.09247
• Published
• 48
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper
• 2512.13586
• Published
• 93
KlingAvatar 2.0 Technical Report
Paper
• 2512.13313
• Published
• 43
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Paper
• 2512.14067
• Published
• 16
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper
• 2512.13687
• Published
• 106
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Paper
• 2512.14052
• Published
• 42
Universal Reasoning Model
Paper
• 2512.14693
• Published
• 43
Image Diffusion Preview with Consistency Solver
Paper
• 2512.13592
• Published
• 8
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
Paper
• 2512.15702
• Published
• 16
STeCa: Step-level Trajectory Calibration for LLM Agent Learning
Paper
• 2502.14276
• Published
• 1
Step-GUI Technical Report
Paper
• 2512.15431
• Published
• 132
Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification
Paper
• 2512.16921
• Published
• 8
Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection
Paper
• 2512.16905
• Published
• 32
DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders
Paper
• 2512.13690
• Published
• 3
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
• 2512.13607
• Published
• 36
Paper
• 2512.16301
• Published
• 107
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Paper
• 2512.12967
• Published
• 108
CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates
Paper
• 2512.10342
• Published
• 1
UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction
Paper
• 2512.12941
• Published
• 2
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
Paper
• 2512.13106
• Published
• 4
Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation
Paper
• 2512.13655
• Published
• 3
Janus: Disaggregating Attention and Experts for Scalable MoE Inference
Paper
• 2512.13525
• Published
• 6
RePo: Language Models with Context Re-Positioning
Paper
• 2512.14391
• Published
• 12
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
Paper
• 2512.14531
• Published
• 15
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement
Paper
• 2512.13303
• Published
• 17
Differentiable Evolutionary Reinforcement Learning
Paper
• 2512.13399
• Published
• 22
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
Paper
• 2512.14699
• Published
• 28
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
Paper
• 2512.13660
• Published
• 37
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published
• 119
Hybrid Attribution Priors for Explainable and Robust Model Training
Paper
• 2512.14719
• Published
• 3
WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory
Paper
• 2512.13190
• Published
• 8
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper
• 2512.19693
• Published
• 66
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper
• 2512.17102
• Published
• 36
Updated
• 71
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Paper
• 2512.19673
• Published
• 64
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
Paper
• 2512.19134
• Published
• 32
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
• 2512.16676
• Published
• 219
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Paper
• 2512.16969
• Published
• 119
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
• 2512.20618
• Published
• 55
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs
Paper
• 2512.17206
• Published
• 20
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper
• 2512.17008
• Published
• 11
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via
Rubric-Based Incremental Training
Paper
• 2510.15859
• Published
• 13
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
Paper
• 2512.14681
• Published
• 42
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper
• 2512.17351
• Published
• 28
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
• 2512.15687
• Published
• 21
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
• 2512.13874
• Published
• 17
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Paper
• 2512.14080
• Published
• 9
Understanding and Improving Hyperbolic Deep Reinforcement Learning
Paper
• 2512.14202
• Published
• 6
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness
Paper
• 2512.15374
• Published
• 6
VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs
Paper
• 2512.12072
• Published
• 17
DEER: Draft with Diffusion, Verify with Autoregressive Models
Paper
• 2512.15176
• Published
• 45
TabReX : Tabular Referenceless eXplainable Evaluation
Paper
• 2512.15907
• Published
• 2
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers
Paper
• 2512.16615
• Published
• 5
AdaTooler-V: Adaptive Tool-Use for Images and Videos
Paper
• 2512.16918
• Published
• 14
REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion
Paper
• 2512.16636
• Published
• 26
Kling-Omni Technical Report
Paper
• 2512.16776
• Published
• 170
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper
• 2512.20605
• Published
• 62
Multi-hop Reasoning via Early Knowledge Alignment
Paper
• 2512.20144
• Published
• 7
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper
• 2512.19995
• Published
• 16
TimeBill: Time-Budgeted Inference for Large Language Models
Paper
• 2512.21859
• Published
• 25
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
Paper
• 2512.22615
• Published
• 49
Training AI Co-Scientists Using Rubric Rewards
Paper
• 2512.23707
• Published
• 21
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
Paper
• 2512.22238
• Published
• 30
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics
Paper
• 2512.21010
• Published
• 4
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper
• 2512.23447
• Published
• 98
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published
• 312
Evaluating Parameter Efficient Methods for RLVR
Paper
• 2512.23165
• Published
• 28
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
Paper
• 2511.16108
• Published
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
Paper
• 2601.03425
• Published
• 16
MMFormalizer: Multimodal Autoformalization in the Wild
Paper
• 2601.03017
• Published
• 105
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Paper
• 2601.03559
• Published
• 14
Token-Level LLM Collaboration via FusionRoute
Paper
• 2601.05106
• Published
• 40
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
Paper
• 2601.05175
• Published
• 36
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
Paper
• 2601.06487
• Published
• 53
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
Paper
• 2601.04890
• Published
• 42
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper
• 2601.07832
• Published
• 52
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published
• 228
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
• 2601.05167
• Published
• 31
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
Paper
• 2601.10129
• Published
• 12
Language of Thought Shapes Output Diversity in Large Language Models
Paper
• 2601.11227
• Published
• 9
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper
• 2601.11004
• Published
• 30
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper
• 2601.15892
• Published
• 53
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Paper
• 2601.15165
• Published
• 72
Learning to Discover at Test Time
Paper
• 2601.16175
• Published
• 42
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought
Paper
• 2601.23184
• Published
• 36
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
Paper
• 2602.05027
• Published
• 60
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
Paper
• 2602.11748
• Published
• 30
Paper
• 2602.11298
• Published
• 16
DFlash: Block Diffusion for Flash Speculative Decoding
Paper
• 2602.06036
• Published
• 42
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
Paper
• 2602.06035
• Published
• 23
Experiential Reinforcement Learning
Paper
• 2602.13949
• Published
• 68
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents
Paper
• 2602.14234
• Published
• 26
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality
Paper
• 2602.14080
• Published
• 20
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
Paper
• 2602.15322
• Published
• 9
OPT-R: Exploring the Role of Explanations in Finetuning and Prompting
for Reasoning Skills of Large Language Models
Paper
• 2305.12001
• Published
• 1
SELF: Language-Driven Self-Evolution for Large Language Model
Paper
• 2310.00533
• Published
• 2
DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation
Paper
• 2601.22904
• Published
• 15
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots
Paper
• 2602.18071
• Published
• 22
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper
• 2601.02256
• Published
• 33
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper
• 2512.24138
• Published
• 29
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper
• 2601.02346
• Published
• 26
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
• 2512.20578
• Published
• 85
Recursive Language Models
Paper
• 2512.24601
• Published
• 90
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs
Paper
• 2601.01836
• Published
• 10
Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion
Paper
• 2512.23035
• Published
• 5
Confidence Estimation for LLMs in Multi-turn Interactions
Paper
• 2601.02179
• Published
• 17
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
Paper
• 2601.01426
• Published
• 24
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
Paper
• 2601.01576
• Published
• 18
Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents
Paper
• 2601.02314
• Published
• 2
M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models
Paper
• 2512.22877
• Published
• 2
Nested Learning: The Illusion of Deep Learning Architectures
Paper
• 2512.24695
• Published
• 44
Paper
• 2601.00417
• Published
• 34
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving
Paper
• 2601.00747
• Published
• 20
InfoSynth: Information-Guided Benchmark Synthesis for LLMs
Paper
• 2601.00575
• Published
• 3
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper
• 2512.24617
• Published
• 65
A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers
Paper
• 2512.23380
• Published
• 44
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
Paper
• 2512.24385
• Published
• 8
Scaling Open-Ended Reasoning to Predict the Future
Paper
• 2512.25070
• Published
• 19
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
• 2512.23988
• Published
• 18
Detecting Anomalies in Machine Learning Infrastructure via Hardware
Telemetry
Paper
• 2510.26008
• Published
CodeLSI: Leveraging Foundation Models for Automated Code Generation with
Low-Rank Optimization and Domain-Specific Instruction Tuning
Paper
• 2509.14373
• Published
Big data analysis and distributed deep learning for next-generation
intrusion detection system optimization
Paper
• 2209.13961
• Published
Viewer
• Updated
• 2.14k • 1.17k
• 187
ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation
Paper
• 2602.20093
• Published
• 28
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
Paper
• 2602.20160
• Published
• 8