Authors: Wissam Antoun, Fady Baly, Hazem Hajj
AraBERT is an Arabic pretrained language model based on Google’s BERT architecture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT PAPER and in the AraBERT Meetup
There is two versions of the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter.
The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words.
Source Code Repository: https://github.com/aub-mind/arabert
Paper: https://www.aclweb.org/anthology/2020.osact-1.2.pdf
Results (Accuracy)
We evaluate both AraBERT models on different downstream tasks and compare it to mBERT, and other state of the art models (To the extent of our knowledge). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR, ArSaS), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD
Task | prev. SOTA | mBERT | AraBERTv0.1 | AraBERTv1 |
---|---|---|---|---|
HARD | 95.7 ElJundi et.al. | 95.7 | 96.2 | 96.1 |
ASTD | 86.5 ElJundi et.al. | 80.1 | 92.2 | 92.6 |
ArsenTD-Lev | 52.4 ElJundi et.al. | 51 | 58.9 | 59.4 |
AJGT | 93 Dahou et.al. | 83.6 | 93.1 | 93.8 |
LABR | 87.5 Dahou et.al. | 83 | 85.9 | 86.7 |
ANERcorp | 81.7 (BiLSTM-CRF) | 78.4 | 84.2 | 81.9 |
ARCD | mBERT | EM:34.2 F1: 61.3 | EM:51.14 F1:82.13 | EM:54.84 F1: 82.15 |
Model Weights and Vocab Download
Models | AraBERTv0.1 | AraBERTv1 |
---|---|---|
TensorFlow | Drive Link | Drive Link |
PyTorch | Drive_Link | Drive_Link |
You can find the PyTorch models in HuggingFace’s Transformer Library under the aubmindlab
username
If you used this model please cite us as:
@inproceedings{antoun2020arabert,
title={AraBERT: Transformer-based Model for Arabic Language Understanding},
author={Antoun, Wissam and Baly, Fady and Hajj, Hazem},
booktitle={LREC 2020 Workshop Language Resources and Evaluation Conference 11--16 May 2020},
pages={9}
}
Acknowledgments
Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn’t have done it without this program, and to the AUB MIND Lab Members for the continous support. Also thanks to Yakshof and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
Contacts
Wissam Antoun: Linkedin | Twitter | Github | wfa07@mail.aub.edu | wissam.antoun@gmail.com
Fady Baly: Linkedin | Twitter | Github | fgb06@mail.aub.edu | baly.fady@gmail.com
We are looking for sponsors to train BERT-Large and other Transformer models, the sponsor only needs to cover to data storage and compute cost of the generating the pretraining data
Security and trust are paramount in the accs market, and this platform excels in providing a safe environment for transactions. With verified sellers, secure payment gateways, and detailed account descriptions, the ACCS market ensures that buyers and sellers can engage confidently. Additionally, the market offers dispute resolution services to handle any issues that may arise, further enhancing the trustworthiness of the platform. The ACCS market’s commitment to security and transparency provides peace of mind, making it a reliable choice for those looking to buy or sell digital accounts.
Thanks for the information. I really like the way you express complex topics in lucid way. It really helps me understand it much better way.
Thanks for the information. I really like the way you express complex topics in lucid way. It really helps me understand it much better way. mrwinstonau.shop
The online market for Hello Kitty merchandise is vast, offering plush toys, clothing, accessories, and more. Fans can find exclusive, stylish items, making it perfect for collectors and enthusiasts alike.For information discover our website.
The Irish rugby jersey, featuring the iconic shamrock emblem, embodies national pride and spirit. Crafted with premium, breathable fabric, it offers maximum comfort and durability. Its vibrant green design reflects Ireland’s rich rugby heritage and passion, making it an essential piece for dedicated fans and players alike Irish rugby jersey
news jotechgeeks
News Jotechgeeks is a leading online platform dedicated to delivering the latest news and updates in the tech industry. It covers a wide range of topics, including new gadget releases, software updates, technological innovations, and industry trends.
Very interesting blog. Alot of blogs I see these days don’t really provide anything that I’m interested in, but I’m most definately interested in this one. Just thought that I would post and let you know. bantengmerah slot
This blog post is a fantastic resource! I feel like I’ve gained a new perspective on the topic. For those hungry for more knowledge, be sure to check out my blog for further exploration!
This blog post is a fantastic resource! I feel like I’ve gained a new perspective on the topic. For those hungry for more knowledge, be sure to check out Designer Pillows for further exploration!
The design of your website is also very user-friendly, making it a breeze to navigate and find exactly what I’m looking for. Keep up the great work! I look forward to reading more from BasantClub
and learning even more about leading a healthy lifestyle.