arxiv:2604.04949

Learning to Retrieve from Agent Trajectories

Published on Mar 30

· Submitted by

Sunhao Dai on Apr 8

#3 Paper of the day

RUC-GSAI-IIRLab

Upvote

Authors:

Sunhao Dai ,

Abstract

Retrieval models for agentic search should be trained directly from agent interaction data using a new paradigm that mines supervision from multi-step agent trajectories and incorporates relevance intensity through weighted optimization.

AI-generated summary

Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than human beings, and is embedded as a core component within multi-turn reasoning and action loops. In this setting, retrieval models trained under human-centric assumptions exhibit a fundamental mismatch with the way agents issue queries and consume results. In this work, we argue that retrieval models for agentic search should be trained directly from agent interaction data. We introduce learning to retrieve from agent trajectories as a new training paradigm, where supervision is derived from multi-step agent interactions. Through a systematic analysis of search agent trajectories, we identify key behavioral signals that reveal document utility, including browsing actions, unbrowsed rejections, and post-browse reasoning traces. Guided by these insights, we propose LRAT, a simple yet effective framework that mines high-quality retrieval supervision from agent trajectories and incorporates relevance intensity through weighted optimization. Extensive experiments on both in-domain and out-of-domain deep research benchmarks demonstrate that retrievers trained with LRAT consistently improve evidence recall, end-to-end task success, and execution efficiency across diverse agent architectures and scales. Our results highlight agent trajectories as a practical and scalable supervision source, pointing to a promising direction for retrieval in the era of agentic search.

View arXiv page View PDF Project page GitHub 24 Add to collection

Community

KID-22

Paper author Paper submitter about 18 hours ago

Key insights:

We identify a fundamental misalignment between human-centric retrieval training and agentic search, and formulate learning to retrieve from agent trajectories as a new retrieval paradigm. In this setting, supervision is derived from multi-step agent interactions, reflecting how search tool is actually
used by search agents.
Guided by insights from empirical analysis, we propose LRAT, a simple yet effective framework that mines high-quality retrieval supervision from agent trajectories, providing a practical step toward agent-aligned retriever training.
Experiments on both in-domain and out-of-domain deep research benchmarks show that LRAT consistently improves evidence retrieval and end-to-end agent performance across diverse agent architectures and scales. We further demonstrate that LRAT can support a self-improving data flywheel, highlighting the scalability value of LRAT in real-world scenarios.

KID-22

Paper author Paper submitter about 14 hours ago

GitHub: https://github.com/Yuqi-Zhou/LRAT
Homepage: https://yuqi-zhou.github.io/LRAT-homepage/
Collection: https://huggingface.co/collections/Yuqi-Zhou/lrat

avahal

about 8 hours ago

the core idea of learning to retrieve from agent trajectories is compelling, and i like how they treat unbrowsed rejections and post-browse reasoning traces as signals of relevance. the weighted optimization for relevance intensity feels like a clean way to translate messy agent behavior into a ranking objective that actually aligns with end-to-end task success. one open question is what happens when trajectories are biased by suboptimal planning or exploration shortcuts—would the method reinforce biased documents? btw the arxivlens breakdown (https://arxivlens.com/PaperView/Details/learning-to-retrieve-from-agent-trajectories-18-8aefbff7) does a nice job unpacking the signal extraction in the method, which helped me connect the dots.

mishig

about 6 hours ago

Learning to Retrieve from Agent Trajectories

LRAT (Learning to Retrieve from Agent Trajectories) addresses a fundamental mismatch: existing retrieval models are trained on human search behavior, but LLM agents search very differently. Humans issue a single query and click a result; agents engage in multi-step browsing loops with queries, browsing actions, rejections, and reasoning steps. LRAT mines supervision directly from these multi-step agent interactions to train retrieval models that actually match how agents search. The work introduces relevance intensity weighting to capture the nuanced signals embedded in agent trajectories. From Renmin University and CAS.

Key Idea

Retrieval models trained for human users assume a simple query-then-click pattern. LLM agents, however, search through iterative multi-turn loops -- querying, browsing documents, rejecting irrelevant ones, reasoning about what they have found, and re-querying with refined terms. This fundamental behavioral mismatch means standard retrieval models serve agents poorly, motivating a new approach that learns from how agents actually interact with search results.

Method / Approach

LRAT extracts three types of supervision signals from agent trajectories: browsing actions (documents the agent chose to read, indicating positive relevance), unbrowsed rejections (documents the agent saw but skipped, indicating negative relevance), and post-browse reasoning traces (the agent's internal reasoning about document quality after reading). These signals are combined using relevance intensity weighting, which assigns graded relevance scores rather than binary labels, capturing the rich spectrum of document utility revealed by agent behavior.

Results

The full LRAT pipeline flows from collecting agent trajectories, through signal mining across the three supervision channels, to producing weighted relevance labels that train a specialized retriever. The resulting retrieval model better serves agent-style search patterns, creating a virtuous cycle: better retrieval leads to more efficient agent trajectories, which in turn yield richer training signals.