Papers
arxiv:2604.02268

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Published on Apr 2
ยท Submitted by
Zhengxi Lu
on Apr 3
Authors:
,
,
,
,
,
,
,
,

Abstract

SKILL0 enables LLM agents to internalize skills during training, allowing zero-shot autonomous behavior through a dynamic curriculum that reduces contextual overhead while improving task performance.

AI-generated summary

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.

Community

Paper author Paper submitter
โ€ข
edited about 21 hours ago
  1. We propose SKILL0, the first RL framework that formulates skill internalization as an explicit training objective, moving agents from inference-time skill dependence to fully autonomous zero-shot behavior.
  2. We introduce in-context reinforcement learning, which provides structured skill guidance during training rollouts and removes it entirely at inference, directly optimizing the transition from context-dependent execution to intrinsic competence.
  3. We propose Dynamic Curriculum, a helpfulness-driven annealing mechanism that withdraws each skill only when the current policy no longer benefits from it, replacing rigid schedules with adaptive internalization.

the in-context skill internalization approach here is interesting, kind of flips the script on how we usually think about agents learning new abilities. found a solid writeup that breaks down how it works https://arxivexplained.com/papers/skill0-in-context-agentic-reinforcement-learning-for-skill-internalization

The dynamic curriculum approach to skill withdrawal is clever โ€” I've seen similar issues where retrieval noise degrades agent performance when irrelevant skills get injected into context. The token efficiency (<0.5k per step) at inference is compelling for production deployments where context budget matters. Curious: did you observe specific skill categories that resisted internalization more than others? In my experience, procedural skills (API calls, tool sequences) tend to compress well, but reasoning-heavy skills often need more explicit scaffolding.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.02268
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.02268 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.02268 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.02268 in a Space README.md to link it from this page.

Collections including this paper 2