Instructions to use chromadb/context-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use chromadb/context-1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="chromadb/context-1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("chromadb/context-1")
model = AutoModelForCausalLM.from_pretrained("chromadb/context-1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use chromadb/context-1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "chromadb/context-1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chromadb/context-1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/chromadb/context-1

SGLang

How to use chromadb/context-1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "chromadb/context-1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chromadb/context-1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "chromadb/context-1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chromadb/context-1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use chromadb/context-1 with Docker Model Runner:
```
docker model run hf.co/chromadb/context-1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Chroma Context-1

Context-1 is a 20B parameter agentic search model trained to retrieve supporting documents for complex, multi-hop queries. It is designed to be used as a retrieval subagent alongside a frontier reasoning model: given a query, Context-1 decomposes it into subqueries, iteratively searches a corpus, and selectively edits its own context to free capacity for further exploration.

Context-1 achieves retrieval performance comparable to frontier LLMs at a fraction of the cost and up to 10x faster inference speed.

Technical report: Chroma Context-1: Training a Self-Editing Search Agent

Model Details

Base model: gpt-oss-20b
Parameters: 20B (Mixture of Experts)
Training: SFT + RL (CISPO) with a staged curriculum
Precision: BF16 (MXFP4 quantized checkpoint coming soon)

Key Capabilities

Query decomposition: Breaks complex multi-constraint questions into targeted subqueries.
Parallel tool calling: Averages 2.56 tool calls per turn, reducing total turns and end-to-end latency.
Self-editing context: Selectively prunes irrelevant documents mid-search to sustain retrieval quality over long horizons within a bounded context window (0.94 prune accuracy).
Cross-domain generalization: Trained on web, legal, and finance tasks; generalizes to held-out domains and public benchmarks (BrowseComp-Plus, SealQA, FRAMES, HLE).

Important: Agent Harness Required

Context-1 is trained to operate within a specific agent harness that manages tool execution, token budgets, context pruning, and deduplication. The harness is not yet public. Running the model without it will not reproduce the results reported in the technical report.

We plan to release the full agent harness and evaluation code soon. In the meantime, the technical report describes the harness design in detail.

Citation

@techreport{bashir2026context1,
  title = {Chroma Context-1: Training a Self-Editing Search Agent},
  author = {Bashir, Hammad and Hong, Kelly and Jiang, Patrick and Shi, Zhiyi},
  year = {2026},
  month = {March},
  institution = {Chroma},
  url = {https://trychroma.com/research/context-1},
}

License

Apache 2.0

Downloads last month: 1,453

Safetensors

Model size

21B params

Tensor type

BF16

Model tree for chromadb/context-1

Base model

openai/gpt-oss-20b

Finetuned

(510)

this model

Quantizations

7 models