On the importance of Data Scale in Pretraining Arabic Language Models
Paper • 2401.07760 • Published • 1
How to use huawei-noah/AT5Sv2 with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("huawei-noah/AT5Sv2")
model = AutoModelForSeq2SeqLM.from_pretrained("huawei-noah/AT5Sv2")
This model is only compatible with the code in this github repo (not supported by the Transformers library)
Please cite the following paper when using our code and model:
@misc{ghaddar2024importance,
title={On the importance of Data Scale in Pretraining Arabic Language Models},
author={Abbas Ghaddar and Philippe Langlais and Mehdi Rezagholizadeh and Boxing Chen},
year={2024},
eprint={2401.07760},
archivePrefix={arXiv},
primaryClass={cs.CL}
}