Vietnamese ↔ English Transformer (25M)

A custom bidirectional Vietnamese–English translation model built from scratch using a Transformer encoder-decoder architecture (~25 M parameters) with shared vocabulary (32k tokens) and weight-tying.

Model Details

Property	Value
Architecture	Transformer Encoder-Decoder
Parameters	~25 M
Vocabulary	32,000 shared (BPE)
Training data	MTET bidirectional dataset (~cleaned)
Direction	VI → EN and EN → VI (bidirectional)
Precision	BF16

Usage

import sys, torch
from pathlib import Path

root = Path(".")          # set to the repo root after cloning
sys.path.append(str(root / "src"))

from complete_transformer import create_model
from shared_vocab_utils import load_shared_vocab_info, create_shared_vocab_wrapper
from inference_evaluation import translate_sentence

info = load_shared_vocab_info()
vi_vocab, en_vocab = create_shared_vocab_wrapper()

model, cfg = create_model(
    info["vocab_size"], info["vocab_size"],
    model_size="custom_25m",
    pad_idx=info["pad_id"],
    use_shared_vocab=True,
    use_weight_tying=True,
)
ckpt = torch.load("checkpoints/best_model.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

sentence = "xin chào, hôm nay thời tiết thế nào?"
translation = translate_sentence(model, sentence, vi_vocab, en_vocab, device,
                                  use_beam_search=True, beam_size=5)
print(translation)

Training

Dataset: MTET bidirectional (Vietnamese–English parallel corpus, cleaned)
Optimizer: Adam with warm-up scheduling
Epochs: 27+ (with resume support)
Hardware: Single GPU with BF16 mixed precision

Files

File	Description
`checkpoints/best_model.pt`	Best model checkpoint
`data/processed/tokenizer_shared.json`	Shared BPE tokenizer
`data/processed/shared_vocab_info.json`	Vocabulary metadata
`src/`	Full model source code

Downloads last month: -; Downloads are not tracked for this model. How to track