deutsche-telekom/Ger-RAG-eval
Viewer • Updated • 4k • 320 • 48
How to use qikp/small-german-tokenizer with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("qikp/small-german-tokenizer", dtype="auto")This is a small public domain-like tokenizer optimized for German.
[EOS][PAD]This tokenizer was trained on the context column of the configs task1 and task4 in deutsche-telekom/Ger-RAG-eval.
Due to its small corpus, this tokenizer may split words into smaller pieces. Also, some uncommon special tokens aren't present, you'll have to add them manually if needed.