Vietnamese ↔ English Transformer (25M)

A custom bidirectional Vietnamese–English translation model built from scratch using a Transformer encoder-decoder architecture (~25 M parameters) with shared vocabulary (32k tokens) and weight-tying.

Model Details

Property Value
Architecture Transformer Encoder-Decoder
Parameters ~25 M
Vocabulary 32,000 shared (BPE)
Training data MTET bidirectional dataset (~cleaned)
Direction VI → EN and EN → VI (bidirectional)
Precision BF16

Usage

import sys, torch
from pathlib import Path

root = Path(".")          # set to the repo root after cloning
sys.path.append(str(root / "src"))

from complete_transformer import create_model
from shared_vocab_utils import load_shared_vocab_info, create_shared_vocab_wrapper
from inference_evaluation import translate_sentence

info = load_shared_vocab_info()
vi_vocab, en_vocab = create_shared_vocab_wrapper()

model, cfg = create_model(
    info["vocab_size"], info["vocab_size"],
    model_size="custom_25m",
    pad_idx=info["pad_id"],
    use_shared_vocab=True,
    use_weight_tying=True,
)
ckpt = torch.load("checkpoints/best_model.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

sentence = "xin chào, hôm nay thời tiết thế nào?"
translation = translate_sentence(model, sentence, vi_vocab, en_vocab, device,
                                  use_beam_search=True, beam_size=5)
print(translation)

Training

  • Dataset: MTET bidirectional (Vietnamese–English parallel corpus, cleaned)
  • Optimizer: Adam with warm-up scheduling
  • Epochs: 27+ (with resume support)
  • Hardware: Single GPU with BF16 mixed precision

Files

File Description
checkpoints/best_model.pt Best model checkpoint
data/processed/tokenizer_shared.json Shared BPE tokenizer
data/processed/shared_vocab_info.json Vocabulary metadata
src/ Full model source code
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support