Authors: Wissam Antoun, Fady Baly, Hazem Hajj
AraBERT is an Arabic pretrained language model based on Google’s BERT architecture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT PAPER and in the AraBERT Meetup
There is two versions of the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter.
The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words.
Source Code Repository: https://github.com/aub-mind/arabert
Paper: https://www.aclweb.org/anthology/2020.osact-1.2.pdf
Results (Accuracy)
We evaluate both AraBERT models on different downstream tasks and compare it to mBERT, and other state of the art models (To the extent of our knowledge). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR, ArSaS), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD
Task | prev. SOTA | mBERT | AraBERTv0.1 | AraBERTv1 |
---|---|---|---|---|
HARD | 95.7 ElJundi et.al. | 95.7 | 96.2 | 96.1 |
ASTD | 86.5 ElJundi et.al. | 80.1 | 92.2 | 92.6 |
ArsenTD-Lev | 52.4 ElJundi et.al. | 51 | 58.9 | 59.4 |
AJGT | 93 Dahou et.al. | 83.6 | 93.1 | 93.8 |
LABR | 87.5 Dahou et.al. | 83 | 85.9 | 86.7 |
ANERcorp | 81.7 (BiLSTM-CRF) | 78.4 | 84.2 | 81.9 |
ARCD | mBERT | EM:34.2 F1: 61.3 | EM:51.14 F1:82.13 | EM:54.84 F1: 82.15 |
Model Weights and Vocab Download
Models | AraBERTv0.1 | AraBERTv1 |
---|---|---|
TensorFlow | Drive Link | Drive Link |
PyTorch | Drive_Link | Drive_Link |
You can find the PyTorch models in HuggingFace’s Transformer Library under the aubmindlab
username
If you used this model please cite us as:
@inproceedings{antoun2020arabert,
title={AraBERT: Transformer-based Model for Arabic Language Understanding},
author={Antoun, Wissam and Baly, Fady and Hajj, Hazem},
booktitle={LREC 2020 Workshop Language Resources and Evaluation Conference 11--16 May 2020},
pages={9}
}
Acknowledgments
Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn’t have done it without this program, and to the AUB MIND Lab Members for the continous support. Also thanks to Yakshof and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
Contacts
Wissam Antoun: Linkedin | Twitter | Github | wfa07@mail.aub.edu | wissam.antoun@gmail.com
Fady Baly: Linkedin | Twitter | Github | fgb06@mail.aub.edu | baly.fady@gmail.com
We are looking for sponsors to train BERT-Large and other Transformer models, the sponsor only needs to cover to data storage and compute cost of the generating the pretraining data
When some one searches for his essential thing, so he/she desires to be available that in detail, so that thing is maintained
over here.
My blog post سمساری غرب تهران منطقه ۵
Astounding .. Astonishing .. I’ll bookmark your web journal and take the bolsters additionally… I’m upbeat to discover such a large number of valuable information here in the post, we require work out more methods in such manner, a debt of gratitude is in order for sharing. 카지노사이트
When some one searches for his necessary thing, thus he/she wishes to
be available that in detail, thus that thing is maintained over here.
Also visit my web site; شرکت تدبیر بنای آرین
This content is incredibly well-researched and beautifully written. I found myself so engaged that I couldn’t stop reading. I’m truly impressed by your work and skill. Thank you!
catamaran Milos
Whoa! This blog looks just like my old one! It’s on a entirely different subject but it has pretty much
the same page layout and design. Great choice of colors!
Feel free to surf to my homepage :: خریدار لوازم منزل در تهران
What a fantastic piece of writing! The depth of research and quality of wording is commendable. I was so captivated that I just had to keep reading. Your expertise shines through—thank you for sharing!
catamaran Milos day tour
I appreciate the valuable insights you provide in your articles. I’ll be sure to bookmark your blog and return regularly for updates. I’m confident I’ll discover plenty of new information here! Best of luck with your future posts!
Luvico GmbH Trockenbau Schweiz
I truly appreciate the effort you put into this article! It offers great insights and valuable information that will surely benefit many readers. Thank you for sharing such a well-researched piece! catamaran Crete
This post is incredibly informative and engaging! I enjoyed reading it and learned a lot from your insights. Keep up the excellent work, and I look forward to your future posts! Catamaran Elounda
Greate article. Keep writing such kind of information on your blog.
Im really impressed by it.
Hi there, You’ve performed a fantastic job. I’ll definitely digg
it and in my view recommend to my friends.
I’m sure they’ll be benefited from this site.
Also visit my page; خریدار مبلمان خارجی و قدیمی مالزی