Open Machine Translation for Esperanto
Collection
Open-source models, datasets, and code for machine translation to and from Esperanto. โข 4 items โข Updated
This repository contains a multilingual MarianMT model for (English, Spanish, Catalan) โ Esperanto translation with tiny architecture.
This model is not intended for direct inference through the Hugging Face transformers library.
Use Marian for inference instead.
The repository includes the following files:
model.npz.best-chrf.npz โ trained Marian model checkpointtiny.decoder.yml โ decoder configurationvocab.spm โ SentencePiece vocabularyrun_model.sh โ Example script on how to run the modelThe model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.
Training sentence-pair counts:
Run decoding from inside the model directory:
cat input.spa \
marian-decoder \
-c tiny.decoder.yml \
--output output.epo \
--normalize \
-m model.npz.best-chrf.npz \
--vocabs vocab.spm vocab.spm \
--log decode.log \
--devices 0