HuBERT VP-20

HuBERT VP-20 is a HuBERT base model pretrained on a subset of 6k hours and 20 languages of VoxPopuli (all EU languages except English, French, and German) for the DiscoPhon benchmark. It was pretrained using the minimal_hubert library.

You can load it with HuggingFace Transformers:

from transformers import HubertModel

model = HubertModel.from_pretrained("coml/hubert-base-vp20")

Or with minimal_hubert:

from minimal_hubert import HuBERT, HuBERTPretrain

# Standard model
model = HuBERT.from_pretrained("coml/hubert-base-vp20")
# With pretraining head for classification
model_for_pretraining = HuBERTPretrain.from_pretrained("https://huggingface.co/coml/hubert-base-vp20/resolve/main/it2.pt")

Check out minimal_hubert if you are interested in pretraining or want to load HuBERT checkpoints from different libraries.

Files:

model.safetensors and config.json: HuggingFace Transformers checkpoint and config.
it1.pt: 1st iteration checkpoint.
it2.pt: 2nd iteration checkpoint. Converted to HuggingFace state_dict to get model.safetensors.
km100-mfcc.joblib: K-means trained on MFCCs of VoxPopuli-20. Used to train the 1st iteration.
km500-it1-l10.joblib: K-means trained on features from the 10th layer of the 1st iteration model. Used to train the 2nd iteration.
km256-it2-l11.joblib: K-means trained on features from the 11th layer of the 2nd iteration model. Used for DiscoPhon finetuning.

Citing

@misc{poli2026discophon,
  title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
  author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
  year={2026},
  eprint={2603.18612},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.18612},
}

Downloads last month: -

Safetensors

Model size

94.4M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for coml/hubert-base-vp20

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units

Paper • 2603.18612 • Published Mar 19