2603.14610)

Their purpose is to translate the penultimate layer of a a classifier to the image embedding space of CLIP. In this repo they currently support (https://huggingface.co/kakaobrain/karlo-v1-alpha) variation but can easily retrained to any other. All these models were trained on MSE loss via ridge regression with scikit-learn library and then migrated to PyTorch.

Translators Layout

Expected structure:

translators/
  registry.yaml
  resnet/
    linear/
      metadata.yaml
      best.pt
  dinovit1/
    linear/
      metadata.yaml
      best.pt

Required metadata fields:

model_name
translator_name
architecture
embedding_backend (based on https://huggingface.co/kakaobrain/karlo-v1-alpha)
in_dim (positive integer)
out_dim (positive integer)
hidden_dim (optional; positive integer; defaults to in_dim)
checkpoint_file (relative path only; no absolute paths or ..)

Validation notes:

Metadata schema is strict: unknown keys are rejected.
model_name and translator_name inside metadata.yaml must match their directory names.
architecture supported are one of: linear, 3layer, 4layer, residual.
Registry lookup is case-insensitive by model name and rejects ambiguous case-collisions (for example both ResNet and resnet).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train harel316/SING_translators

Paper for harel316/SING_translators

Make it SING: Analyzing Semantic Invariants in Classifiers

Paper • 2603.14610 • Published 12 days ago • 16