AraBERT : Pre-training BERT for Arabic Language Understanding

Authors: Wissam Antoun, Fady Baly, Hazem Hajj

AraBERT is an Arabic pretrained language model based on Google’s BERT architecture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT PAPER and in the AraBERT Meetup

There is two versions of the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter.

The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words.

Source Code Repository: https://github.com/aub-mind/arabert
Paper: https://www.aclweb.org/anthology/2020.osact-1.2.pdf

Results (Accuracy)

We evaluate both AraBERT models on different downstream tasks and compare it to mBERT, and other state of the art models (To the extent of our knowledge). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR, ArSaS), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD

Task	prev. SOTA	mBERT	AraBERTv0.1	AraBERTv1
HARD	95.7 ElJundi et.al.	95.7	96.2	96.1
ASTD	86.5 ElJundi et.al.	80.1	92.2	92.6
ArsenTD-Lev	52.4 ElJundi et.al.	51	58.9	59.4
AJGT	93 Dahou et.al.	83.6	93.1	93.8
LABR	87.5 Dahou et.al.	83	85.9	86.7
ANERcorp	81.7 (BiLSTM-CRF)	78.4	84.2	81.9
ARCD	mBERT	EM:34.2 F1: 61.3	EM:51.14 F1:82.13	EM:54.84 F1: 82.15

Model Weights and Vocab Download

Models	AraBERTv0.1	AraBERTv1
TensorFlow	Drive Link	Drive Link
PyTorch	Drive_Link	Drive_Link

You can find the PyTorch models in HuggingFace’s Transformer Library under the aubmindlab username

If you used this model please cite us as:

@inproceedings{antoun2020arabert,
  title={AraBERT: Transformer-based Model for Arabic Language Understanding},
  author={Antoun, Wissam and Baly, Fady and Hajj, Hazem},
  booktitle={LREC 2020 Workshop Language Resources and Evaluation Conference 11--16 May 2020},
  pages={9}
}

Acknowledgments

Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn’t have done it without this program, and to the AUB MIND Lab Members for the continous support. Also thanks to Yakshof and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.

Contacts

Wissam Antoun: Linkedin | Twitter | Github | wfa07@mail.aub.edu | wissam.antoun@gmail.com

Fady Baly: Linkedin | Twitter | Github | fgb06@mail.aub.edu | baly.fady@gmail.com

We are looking for sponsors to train BERT-Large and other Transformer models, the sponsor only needs to cover to data storage and compute cost of the generating the pretraining data

28,638 Comments

dui lexington va on December 19, 2024 at 2:46 pm

The Law Offices Of SRIS, P.C.’s lawyers have expertise helping customers find answers to their legal issues
dui lexington va

George Belly on December 19, 2024 at 2:51 pm

I rattling pleased to find this internet site on bing, just what I was searching for also saved to my bookmarks. https://izonemedia360.com/

KASHIF KHAN on December 19, 2024 at 3:09 pm

Thanks for the points shared using your blog. Something else I would like to talk about is that weight reduction is not information about going on a dietary fad and trying to reduce as much weight that you can in a set period of time. The most effective way to burn fat is by having it slowly and gradually and obeying some basic ideas which can assist you to make the most from your attempt to shed weight. You may realize and already be following a few of these tips, nevertheless reinforcing knowledge never does any damage. https://luck8.vote

Situs slot online on December 19, 2024 at 3:13 pm

If you’re looking for fun and rewards, situs slot online is the perfect choice. This platform offers a variety of slot games that are easy to play and packed with bonuses. The user-friendly interface

live100plus on December 19, 2024 at 3:29 pm

That appears to be excellent however i am still not too sure that I like it. At any rate will look far more into it and decide personally! Mitolyn

HUZAIFA on December 19, 2024 at 3:35 pm

You really make it seem so easy with your presentation but I find this topic to be really something that I think I would never understand. It seems too complex and extremely broad for me. I’m looking forward for your next post, I’ll try to get the hang of it! GLUCO6

kashif kashif kashif on December 19, 2024 at 3:52 pm

Perhaps this is a bit off topic but in any case, I have been surfing about your blog and it looks really neat. impassioned about your writing. I am creating a new site and hard-pressed to make it appear great, and supply excellent articles. I have discovered a lot on your site and I watch forward to additional updates and will be back. https://luck8.vote

abdulla abdulla on December 19, 2024 at 3:57 pm

To observe picked up against your site even now setting up procedure in plain english only one minimal little bit of submits. Pleasurable way of possibilities long run, We’re also book-marking at a time safe and sound forms prevent rises along. GLUCO6

Ngân Hàng Đề Thi EMO on December 19, 2024 at 4:15 pm

Ngân hàng đề thi EMO – Giải pháp toàn diện cho giáo dục hiện đại
https://nganhangdeemo.com/

Sổ Mơ EMO on December 19, 2024 at 4:44 pm

Sổ Mơ EMO – Mở lối tương lai từ những giấc mơ hôm nay
https://somoemo.com/

ldii on December 19, 2024 at 4:58 pm

I appreciate your article it is well written and the topic also well expleined. Keep up the good work ldii

Jame on December 19, 2024 at 5:15 pm

Компанія Apple також не несе відповідальність за точність або достовірність
даних, розміщених на веб-сайтах сторонніх виробників.

Also visit my web-site :: сторіс анонімно

Miles Baker on December 19, 2024 at 5:59 pm

Thanks for posting this info. I just want to let you know that I just check out your site and I find it very interesting and informative. I can’t wait to read lots of your posts. slot5000

AraBERT : Pre-training BERT for Arabic Language Understanding

Authors: Wissam Antoun, Fady Baly, Hazem Hajj

Results (Accuracy)

Model Weights and Vocab Download

If you used this model please cite us as:

Acknowledgments

Contacts

28,638 Comments

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

Subscribe By Email