On the importance of Data Scale in Pretraining Arabic Language Models
Paper โข 2401.07760 โข Published โข 1
How to use huawei-noah/JABERv2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="huawei-noah/JABERv2") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("huawei-noah/JABERv2")
model = AutoModelForMaskedLM.from_pretrained("huawei-noah/JABERv2")
This model is only compatible with the code in this github repo (not supported by the Transformers library)
Please cite the following paper when using our code and model:
@misc{ghaddar2024importance,
title={On the importance of Data Scale in Pretraining Arabic Language Models},
author={Abbas Ghaddar and Philippe Langlais and Mehdi Rezagholizadeh and Boxing Chen},
year={2024},
eprint={2401.07760},
archivePrefix={arXiv},
primaryClass={cs.CL}
}