Make it SING: Analyzing Semantic Invariants in Classifiers
Paper • 2603.14610 • Published • 16
Their purpose is to translate the penultimate layer of a a classifier to the image embedding space of CLIP. In this repo they currently support (https://huggingface.co/kakaobrain/karlo-v1-alpha) variation but can easily retrained to any other. All these models were trained on MSE loss via ridge regression with scikit-learn library and then migrated to PyTorch.
Expected structure:
translators/
registry.yaml
resnet/
linear/
metadata.yaml
best.pt
dinovit1/
linear/
metadata.yaml
best.pt
model_nametranslator_namearchitectureembedding_backend (based on https://huggingface.co/kakaobrain/karlo-v1-alpha)in_dim (positive integer)out_dim (positive integer)hidden_dim (optional; positive integer; defaults to in_dim)checkpoint_file (relative path only; no absolute paths or ..)model_name and translator_name inside metadata.yaml must match their directory names.architecture supported are one of: linear, 3layer, 4layer, residual.ResNet and resnet).