Translator models for Make it SING (https://huggingface.co/papers/2603.14610)

Their purpose is to translate the penultimate layer of a a classifier to the image embedding space of CLIP. In this repo they currently support (https://huggingface.co/kakaobrain/karlo-v1-alpha) variation but can easily retrained to any other. All these models were trained on MSE loss via ridge regression with scikit-learn library and then migrated to PyTorch.

Translators Layout

Expected structure:

translators/
  registry.yaml
  resnet/
    linear/
      metadata.yaml
      best.pt
  dinovit1/
    linear/
      metadata.yaml
      best.pt

Required metadata fields:

  • model_name
  • translator_name
  • architecture
  • embedding_backend (based on https://huggingface.co/kakaobrain/karlo-v1-alpha)
  • in_dim (positive integer)
  • out_dim (positive integer)
  • hidden_dim (optional; positive integer; defaults to in_dim)
  • checkpoint_file (relative path only; no absolute paths or ..)

Validation notes:

  • Metadata schema is strict: unknown keys are rejected.
  • model_name and translator_name inside metadata.yaml must match their directory names.
  • architecture supported are one of: linear, 3layer, 4layer, residual.
  • Registry lookup is case-insensitive by model name and rejects ambiguous case-collisions (for example both ResNet and resnet).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train harel316/SING_translators

Paper for harel316/SING_translators