Catalan, English, Spanish -> Esperanto MT Model

Model description

This repository contains a multilingual MarianMT model for (English, Spanish, Catalan) → Esperanto translation with tiny architecture.

This model is not intended for direct inference through the Hugging Face transformers library.

Use Marian for inference instead.

The repository includes the following files:

model.npz.best-chrf.npz — trained Marian model checkpoint
tiny.decoder.yml — decoder configuration
vocab.spm — SentencePiece vocabulary
run_model.sh — Example script on how to run the model

Training data

The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

Training sentence-pair counts:

ca-eo: 672,931
es-eo: 4,677,945
eo-en: 5,000,000

Inference

Run decoding from inside the model directory:

cat input.spa  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.epo \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including Helsinki-NLP/opusmt-caenes-eo_tiny

Open Machine Translation for Esperanto

Collection

Open-source models, datasets, and code for machine translation to and from Esperanto. • 4 items • Updated about 13 hours ago