Vietnamese ↔ English Transformer (25M)
A custom bidirectional Vietnamese–English translation model built from scratch using a Transformer encoder-decoder architecture (~25 M parameters) with shared vocabulary (32k tokens) and weight-tying.
Model Details
| Property |
Value |
| Architecture |
Transformer Encoder-Decoder |
| Parameters |
~25 M |
| Vocabulary |
32,000 shared (BPE) |
| Training data |
MTET bidirectional dataset (~cleaned) |
| Direction |
VI → EN and EN → VI (bidirectional) |
| Precision |
BF16 |
Usage
import sys, torch
from pathlib import Path
root = Path(".")
sys.path.append(str(root / "src"))
from complete_transformer import create_model
from shared_vocab_utils import load_shared_vocab_info, create_shared_vocab_wrapper
from inference_evaluation import translate_sentence
info = load_shared_vocab_info()
vi_vocab, en_vocab = create_shared_vocab_wrapper()
model, cfg = create_model(
info["vocab_size"], info["vocab_size"],
model_size="custom_25m",
pad_idx=info["pad_id"],
use_shared_vocab=True,
use_weight_tying=True,
)
ckpt = torch.load("checkpoints/best_model.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
sentence = "xin chào, hôm nay thời tiết thế nào?"
translation = translate_sentence(model, sentence, vi_vocab, en_vocab, device,
use_beam_search=True, beam_size=5)
print(translation)
Training
- Dataset: MTET bidirectional (Vietnamese–English parallel corpus, cleaned)
- Optimizer: Adam with warm-up scheduling
- Epochs: 27+ (with resume support)
- Hardware: Single GPU with BF16 mixed precision
Files
| File |
Description |
checkpoints/best_model.pt |
Best model checkpoint |
data/processed/tokenizer_shared.json |
Shared BPE tokenizer |
data/processed/shared_vocab_info.json |
Vocabulary metadata |
src/ |
Full model source code |