Authors: Wissam Antoun, Fady Baly, Hazem Hajj


AraBERT is an Arabic pretrained language model based on Google’s BERT architecture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT PAPER and in the AraBERT Meetup

There is two versions of the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter.

The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words.

Source Code Repository:


Results (Accuracy)

We evaluate both AraBERT models on different downstream tasks and compare it to mBERT, and other state of the art models (To the extent of our knowledge). The Tasks were Sentiment Analysis on 6 different datasets (HARDASTD-BalancedArsenTD-LevLABRArSaS), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD

Task prev. SOTA mBERT AraBERTv0.1 AraBERTv1
HARD 95.7 ElJundi 95.7 96.2 96.1
ASTD 86.5 ElJundi 80.1 92.2 92.6
ArsenTD-Lev 52.4 ElJundi 51 58.9 59.4
AJGT 93 Dahou 83.6 93.1 93.8
LABR 87.5 Dahou 83 85.9 86.7
ANERcorp 81.7 (BiLSTM-CRF) 78.4 84.2 81.9
ARCD mBERT EM:34.2 F1: 61.3 EM:51.14 F1:82.13 EM:54.84 F1: 82.15

Model Weights and Vocab Download

Models AraBERTv0.1 AraBERTv1
TensorFlow Drive Link Drive Link
PyTorch Drive_Link Drive_Link

You can find the PyTorch models in HuggingFace’s Transformer Library under the aubmindlab username

If you used this model please cite us as:

  title={AraBERT: Transformer-based Model for Arabic Language Understanding},
  author={Antoun, Wissam and Baly, Fady and Hajj, Hazem},
  booktitle={LREC 2020 Workshop Language Resources and Evaluation Conference 11--16 May 2020},


Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn’t have done it without this program, and to the AUB MIND Lab Members for the continous support. Also thanks to Yakshof and Assafir for data and storage access. Another thanks for Habib Rahal (, for putting a face to AraBERT.


Wissam AntounLinkedin | Twitter | Github | |

Fady BalyLinkedin | Twitter | Github | |

We are looking for sponsors to train BERT-Large and other Transformer models, the sponsor only needs to cover to data storage and compute cost of the generating the pretraining data