Authors: Obeida ElJundi, Wissam Antoun, Nour El Droubi, Hazem Hajj, Wassim El-Hajj, Khaled Shaban
Paper: https://www.aclweb.org/anthology/W19-4608
Source Code Repository: https://github.com/aub-mind/hULMonA
Project: http://oma-project.com/
Introduction
Recent state-of-the-art models in NLP (e.g., BERT, GPT, ULMFiT) utilize transfer learning by pre-training a language model on large corpus and then fine-tuning it on any downstream task. We developed the first Arabic specific universal language model, hULMonA, that can be fine-tuned for almost any Arabic text classification task. We evaluated hULMonA on Sentiment Analysis and achieved state-of-the-art on 4 Arabic datasets. hULMonA consists of three main stages:
1. General domain hULMonA pretraining
To capture the various properties of the Arabic language, we train the SOTA-ish language model AWD-LSTM on all Arabic Wikipedia.
This step is time-consuming, but it should be done only once. We publish our pre-trained model, and it is available in the models directory. To check the implementation details, or to pre-train your own LM, check build_arabic_language_model.ipynb
2. Target task hULMonA fine-tuning
The target task data (e.g., Twitter) will likely come from a different distribution than the general-domain data (Wikipedia). Therefore, fine-tuning the pretrained general-domain LM on the target task data is necessary for the LM to adapt to the new textual properties (e.g., dialects).
To fine-tune the pre-trained hULMonA on your own dataset, please check fine_tune_LM.ipynb
3. Target task classification
Finally, for downstream task classification, we augment the fine-tuned hULMonA with two fully connected layers with ReLU and Softmax activations respectively. Implementation details can be found here: fine_tune_LM.ipynb
How do I cite hULMonA?
Please cite this paper:
@inproceedings{eljundi2019hulmona,
title={hULMonA: The Universal Language Model in Arabic},
author={ElJundi, Obeida and Antoun, Wissam and El Droubi, Nour and Hajj, Hazem and El-Hajj, Wassim and Shaban, Khaled},
booktitle={Proceedings of the Fourth Arabic Natural Language Processing Workshop},
pages={68--77},
year={2019}
}
Contact information
For help, issues, or personal communication related to using hULMonA, please contact Obeida ElJundi (oae15@mail.aub.edu
), Wissam Antoun (wfa07@mail.aub.edu
), or Nour El Droubi (ngd02@mail.aub.edu
)
Great post. I was checking continuously this blog and I am impressed! Extremely helpful info particularly the last part I care for such info a lot. I was looking for this certain info for a long time. Thank you and best of luck. rtp slot
You’re really good. I accept your opinion It’s amazing what articles you come up with. ebet ลิงค์รับทรัพย์
Im creating articles upon cancers, which can be some tips i was initially undertaking study for when My partner and i observed your blog post submit. There are usually countless health problems available that people are generally being affected by, I had no idea! This can be strong details, many thanks. Email Marketing Best Practices