Sentence Similarity
Safetensors
sentence-transformers
English
PyLate
modernbert
ColBERT
feature-extraction
code-search
knowledge-distillation
apple-silicon
mps
text-embeddings-inference
Instructions to use ctrltokyo/ColBERT-Zero-6L-CodeSearch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use ctrltokyo/ColBERT-Zero-6L-CodeSearch with sentence-transformers:
from pylate import models queries = [ "Which planet is known as the Red Planet?", "What is the largest planet in our solar system?", ] documents = [ ["Mars is the Red Planet.", "Venus is Earth's twin."], ["Jupiter is the largest planet.", "Saturn has rings."], ] model = models.ColBERT(model_name_or_path="ctrltokyo/ColBERT-Zero-6L-CodeSearch") queries_emb = model.encode(queries, is_query=True) docs_emb = model.encode(documents, is_query=False) - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - ColBERT | |
| - PyLate | |
| - sentence-transformers | |
| - sentence-similarity | |
| - feature-extraction | |
| - code-search | |
| - knowledge-distillation | |
| - modernbert | |
| - apple-silicon | |
| - mps | |
| pipeline_tag: sentence-similarity | |
| library_name: PyLate | |
| license: apache-2.0 | |
| language: | |
| - en | |
| datasets: | |
| - sentence-transformers/codesearchnet | |
| base_model: lightonai/ColBERT-Zero | |
| # ColBERT-Zero-6L-CodeSearch | |
| A **6-layer ColBERT model** distilled from [ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) (22 layers) for code search, achieving **85% of the teacher's retrieval quality at 13x faster query speed**. | |
| ## Model Details | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | **Architecture** | ModernBERT (6 layers, 768 hidden, 12 heads) | | |
| | **Base Model** | [lightonai/ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) | | |
| | **Output Dimensionality** | 128 per-token embeddings | | |
| | **Similarity Function** | MaxSim (late interaction) | | |
| | **Parameters** | ~38M (vs ~100M teacher) | | |
| | **Query Length** | 32 tokens | | |
| | **Document Length** | 180 tokens | | |
| | **License** | Apache 2.0 | | |
| ## Benchmark Results | |
| Evaluated on 3 code search corpora (150 questions total) via [litembeddings](https://github.com/alexandernicholson/litembeddings): | |
| | Corpus | Teacher MRR | Student MRR | % of Teacher | Student Query Speed | | |
| |--------|------------|-------------|--------------|---------------------| | |
| | jq (C) | 0.539 | 0.355 | 65.9% | ~7ms | | |
| | Rails (Ruby) | 0.679 | 0.581 | 85.6% | ~3ms | | |
| | FastAPI (Python) | 0.782 | 0.766 | **98.0%** | ~4ms | | |
| | **Aggregate** | **0.667** | **0.568** | **85.1%** | **~5ms** | | |
| The student model is approximately **13x faster** at query time than the teacher while retaining 85% of retrieval quality. Performance is particularly strong on Python code search (98% of teacher). | |
| ## How the Student Was Built | |
| ### Architecture: Layer Pruning from Teacher | |
| The student was created by selecting 6 layers from ColBERT-Zero's 22-layer ModernBERT backbone using a **skewed-late** strategy that preserves more upper layers (which encode retrieval-relevant semantics): | |
| ``` | |
| Teacher layers: [0, 1, 2, ..., 21] (22 total) | |
| Student layers: [0, 8, 14, 17, 19, 21] (6 selected) | |
| ``` | |
| The student inherits: | |
| - All embedding weights from the teacher | |
| - The 768-to-128 ColBERT projection layer | |
| - Selected transformer layers with full weight copying | |
| ### Training: Knowledge Distillation | |
| - **Dataset**: [CodeSearchNet](https://huggingface.co/datasets/sentence-transformers/codesearchnet) (10,000 comment-code pairs) | |
| - **Teacher scoring**: ColBERT-Zero generates MaxSim relevance scores for each query against 1 positive + 3 random negative documents | |
| - **Loss**: PyLate Distillation loss (KL divergence between teacher and student score distributions) | |
| - **Optimizer**: AdamW, lr=5e-5, weight_decay=0.01, warmup_ratio=0.1 | |
| - **Training**: 1000 steps, batch_size=8, gradient_accumulation=4 (effective batch size 32) | |
| - **Hardware**: Apple Silicon (M4 Max) via PyTorch MPS backend, ~17 minutes total | |
| ### Hyperparameter Search | |
| The optimal configuration was found through **30 autonomous experiments** sweeping learning rate, layer selection strategy, batch size, gradient accumulation, weight decay, warmup ratio, number of negatives, training steps, and embedding dimensions. Key findings: | |
| - **Teacher initialization is critical**: Starting from ColBERT-Zero's weights (MRR 0.46) vs raw ModernBERT (MRR 0.08) — a 5.6x improvement | |
| - **Skewed-late layer selection** outperforms evenly-spaced, last-6, and other strategies | |
| - **Effective batch size 32** (bs=8, grad_accum=4) is optimal | |
| - **Weight decay 0.01** provides regularization benefit | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| pip install pylate | |
| ``` | |
| ### Encoding & Retrieval | |
| ```python | |
| from pylate import indexes, models, retrieve | |
| # Load model | |
| model = models.ColBERT(model_name_or_path="ctrltokyo/ColBERT-Zero-6L-CodeSearch") | |
| # Encode documents | |
| doc_embeddings = model.encode( | |
| ["def hello():\n print('Hello, World!')", "class UserAuth:\n ..."], | |
| batch_size=32, | |
| is_query=False, | |
| show_progress_bar=True, | |
| ) | |
| # Encode queries | |
| query_embeddings = model.encode( | |
| ["function that prints a greeting"], | |
| batch_size=32, | |
| is_query=True, | |
| show_progress_bar=True, | |
| ) | |
| # Score with MaxSim | |
| from pylate.scores import colbert_scores | |
| scores = colbert_scores(query_embeddings, doc_embeddings) | |
| print(scores) # Higher = more relevant | |
| ``` | |
| ### Reranking | |
| ```python | |
| from pylate import rank, models | |
| model = models.ColBERT(model_name_or_path="ctrltokyo/ColBERT-Zero-6L-CodeSearch") | |
| queries = ["how to authenticate users"] | |
| documents = [["def login(user, pwd): ...", "def sort_list(arr): ...", "class AuthMiddleware: ..."]] | |
| documents_ids = [["doc1", "doc2", "doc3"]] | |
| queries_embeddings = model.encode(queries, is_query=True) | |
| documents_embeddings = model.encode(documents, is_query=False) | |
| reranked = rank.rerank( | |
| documents_ids=documents_ids, | |
| queries_embeddings=queries_embeddings, | |
| documents_embeddings=documents_embeddings, | |
| ) | |
| ``` | |
| ## GGUF / litembeddings | |
| This model can be converted to GGUF format for use with [litembeddings](https://github.com/alexandernicholson/litembeddings) (SQLite-based embedding engine with SIMD-accelerated MaxSim): | |
| ```bash | |
| # Convert to GGUF | |
| python convert_hf_to_gguf.py ctrltokyo/ColBERT-Zero-6L-CodeSearch --outfile model-f16.gguf --outtype f16 | |
| # Extract projection | |
| python -c " | |
| from safetensors import safe_open | |
| import numpy as np | |
| f = safe_open('1_Dense/model.safetensors', framework='numpy') | |
| f.get_tensor('linear.weight').astype(np.float32).tofile('model.projection') | |
| " | |
| ``` | |
| Then in SQL: | |
| ```sql | |
| SELECT lembed_model('codesearch', 'model-f16.gguf', '{"colbert_projection": "model.projection"}'); | |
| SELECT lembed_maxsim( | |
| lembed_tokens('search_query: how to sort a list'), | |
| lembed_tokens('search_document: def quicksort(arr): ...') | |
| ); | |
| ``` | |
| ## Limitations | |
| - **Weakest on C code search** (65.9% of teacher on jq corpus) — likely because CodeSearchNet training data is Python-heavy | |
| - **Trained on 10k pairs only** — larger training sets or hard negative mining could improve quality further | |
| - **English only** — inherits ColBERT-Zero's language capabilities | |
| - **No asymmetric prompts** — unlike the teacher, this model does not use `search_query:`/`search_document:` prompts (uses `[Q]`/`[D]` prefixes instead) | |
| ## Citation | |
| ```bibtex | |
| @misc{colbert-zero-6l-codesearch, | |
| title={ColBERT-Zero-6L-CodeSearch: A Distilled ColBERT Model for Code Search}, | |
| author={Alexander Nicholson}, | |
| year={2026}, | |
| note={Distilled from ColBERT-Zero (Chaffin et al., 2026) using PyLate on Apple Silicon} | |
| } | |
| ``` | |
| ## Acknowledgments | |
| - [ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) by LightOn AI — the teacher model | |
| - [PyLate](https://github.com/lightonai/pylate) — ColBERT training framework | |
| - [litembeddings](https://github.com/alexandernicholson/litembeddings) — SQLite embedding engine used for benchmarking | |
| - Training and experimentation performed entirely on Apple Silicon (M4 Max) using PyTorch MPS backend | |