We are pleased to announce that we have 3 papers accepted to The Sixth Arabic Natural Language Processing Workshop (WANLP 2021) co-located with EACL 2021. Authored by our talented team members: Tarek Naous, Wissam Antoun, Reem Mahmoud, Fady Baly under the supervision of Prof. Hazem Hajj. The papers target Arabic empathetic conversational agents, generative language models, and language understanding models.
Empathetic BERT2BERT Conversational Model:
Learning Arabic Language Generation with Little Data
Our latest contribution to Arabic Conversational AI leverages knowledge transfer from AraBERT in a BERT2BERT architecture. We address the low resource challenges and achieve sota results in open domain empathetic response generation.
Paper: https://arxiv.org/abs/2103.04353
Abstract: Enabling empathetic behavior in Arabic dialogue agents is an important aspect of building human-like conversational models. While Arabic Natural Language Processing has seen significant advances in Natural Language Understanding (NLU) with language models such as AraBERT, Natural Language Generation (NLG) remains a challenge. The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents. To overcome this issue, we propose a transformer-based encoder-decoder initialized with AraBERT parameters. By initializing the weights of the encoder and decoder with AraBERT pre-trained weights, our model was able to leverage knowledge transfer and boost performance in response generation. To enable empathy in our conversational model, we train it using the ArabicEmpatheticDialogues dataset and achieve high performance in empathetic response generation. Specifically, our model achieved a low perplexity value of 17.0 and an increase in 5 BLEU points compared to the previous state-of-the-art model. Also, our proposed model was rated highly by 85 human evaluators, validating its high capability in exhibiting empathy while generating relevant and fluent responses in open-domain settings.
AraGPT2:
Pre-Trained Transformer for Arabic Language Generation
AraGPT2 is a 1.5B transformer model, the largest for Arabic, trained on 77GB of text for 9 days with a TPUv3-128. The model can generate news articles that are difficult to distinguish from human-written articles. AraGPT2 shows impressive Zero-shot performance on trivia QA.
Paper: arxiv.org/abs/2012.15520
GitHub: https://github.com/aub-mind/arabert/tree/master/aragpt2
Abstract: Recently, pre-trained transformer-based architectures have proven to be very efficient at language modeling and understanding, given that they are trained on a large enough corpus. Applications in language generation for Arabic are still lagging in comparison to other NLP advances primarily due to the lack of advanced Arabic language generation models. In this paper, we develop the first advanced Arabic language generation model, AraGPT2, trained from scratch on a large Arabic corpus of internet text and news articles. Our largest model, AraGPT2-mega, has 1.46 billion parameters, which makes it the largest Arabic language model available. The Mega model was evaluated and showed success on different tasks including synthetic news generation, and zero-shot question answering. For text generation, our best model achieves a perplexity of 29.8 on held-out Wikipedia articles. A study conducted with human evaluators showed the significant success of AraGPT2-mega in generating news articles that are difficult to distinguish from articles written by humans. We thus develop and release an automatic discriminator model with a 98% percent accuracy in detecting model-generated text. The models are also publicly available, hoping to encourage new research directions and applications for Arabic NLP.
AraELECTRA:
Pre-Training Text Discriminators for Arabic Language Understanding
AraELECTRA is our latest advancements in Arabic Language Understanding. The model was trained on 77GB of Arabic text for 24 days. AraELECTRA achieves impressive performance, especially on Question Answering tasks.
Paper: https://arxiv.org/abs/2012.15516
Github: https://github.com/aub-mind/arabert/tree/master/araelectra
Abstract: Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Which, instead of training a model to recover masked tokens, it trains a discriminator model to distinguish true input tokens from corrupted tokens that were replaced by a generator network. On the other hand, current Arabic language representation approaches rely only on pretraining via masked language modeling. In this paper, we develop an Arabic language representation model, which we name AraELECTRA. Our model is pretrained using the replaced token detection objective on large Arabic text corpora. We evaluate our model on multiple Arabic NLP tasks, including reading comprehension, sentiment analysis, and named-entity recognition and we show that AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and with even a smaller model size.
Acknowledgments:
This research was supported by the University Research Board (URB) at the American University of Beirut (AUB), and by the TFRC program, which we thank for the free access to cloud TPUs. We also thank As-Safir newspaper for the data access.
Come back home and love your family. zbet
Our Hourglass waist trainers are excellent for weight loss workout routines. Manufactured in Australia, we sell only the best waist trainers to help achieve those perfect body curves.
Family does not have to be together all the time. Yo88
i love reading this article so beautiful!!great job! North India Tour
I have never had a failure; I just identified 10,000 ways that don’t work. sa88
Your career will fill a significant part of your life. zowin
I wanted to take a moment to express my heartfelt gratitude for your outstanding article. Your insights have truly enlightened me.
สมัคร หวย เกาหลี
Our uncertainty today will be the only obstacle to implementation tomorrow. 123B
This process is definitely confidently seriously the best choice. Every one with incredibly small streaks will be expected via a variety of tradition capabilities. I like to recommend software program such a large amount. work visa
Family is the beginning of life and love never ends. dude theft wars old versions
The only person you are destined to become is the person you choose to be. ww88
visit Old Roll APK for more information.
“Great insights on online earning! If you’re looking for more tips, check out my site: Visit CapCut Premium APK. I’d love to hear your thoughts!”
“I really enjoyed this article! It provided some great insights into [topic]. For anyone interested in further exploring related applications, I recommend checking out Visit Rehmat Jabi for more information.
“Enjoyed the article! If you’re looking for more tips on making money online, visit my site: Visit Get Profit Tricks. I’d appreciate your feedback!”
However, you must keep moving forward. 88clb
The writer is enthusiastic about purchasing wooden furniture on the web and his exploration about best wooden furniture has brought about the arrangement of this article. Batterij met zonnepanelen
Wow, happy to see this awesome post. I hope this think help any newbie for their awesome work. By the way thanks for share this awesomeness from Prijs thuisbatterij
I will be interested in more similar topics. i see you got really very useful topics , i will be always checking your blog thanks Thuisbatterijen
A great website with interesting and unique material what else would you need. Vloer isolatie
This was among the best posts and episode from your team it let me learn many new things. Spouwisolatie
I can set up my new idea from this post. It gives in depth information. Thanks for this valuable information for all,.. Isolerende chape
Truly, this article is really one of the very best in the history of articles. I am a antique ’Article’ collector and I sometimes read some new articles if I find them interesting. And I found this one pretty fascinating and it should go into my collection. Very good work! Lucht lucht warmtepomp
This was among the best posts and episode from your team it let me learn many new things. Chapewerken
I love the way you write and share your niche! Very interesting and different! Keep it coming! Chapewerken
Good to become visiting your weblog again, it has been months for me. Nicely this article that i’ve been waited for so long. I will need this post to total my assignment in the college, and it has exact same topic together with your write-up. Thanks, good share. EPS korrels
The writer is enthusiastic about purchasing wooden furniture on the web and his exploration about best wooden furniture has brought about the arrangement of this article. Installateur zonnepanelen
Wow, happy to see this awesome post. I hope this think help any newbie for their awesome work. By the way thanks for share this awesomeness from Warmtepomp installeren
I have read your article; it is very informative and helpful for me. I admire the valuable information you offer in your articles. Thanks for posting it. Zwembad polypropyleen
The material and aggregation is excellent and telltale as comfortably. Isoleren spouwmuur
That is very helpful for increasing my knowledge in this field. Monoblock zwembaden
I can set up my new idea from this post. It gives in depth information. Thanks for this valuable information for all,.. Buitenzwembaden
Writing with style and getting good compliments on the article is quite hard, to be honest.But you’ve done it so calmly and with so cool feeling and you’ve nailed the job. This article is possessed with style and I am giving good compliment. Best! Zwembad plaatsing
I need to to thank you for this very good read!! I definitely loved every little bit of it. I have you bookmarked to check out new things you post… Zonnepanelen prijs
This is really a nice and informative, containing all information and also has a great impact on the new technology. Thanks for sharing it, Kosten aanleg zwembad
Its as if you had a great grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from more than one angle. Isoleren muur
Actually I read it yesterday but I had some thoughts about it and today I wanted to read it again because it is very well written. Aanleg zwembaden
I am impressed by the information that you have on this blog. It shows how well you understand this subject. Zwembad bouwen
Great post I would like to thank you for the efforts you have made in writing this interesting and knowledgeable article. automated forex trading
Possessing read this I thought it had been quite beneficial. I value you taking time and work to put this article together. I when once again locate myself paying method to significantly time both reading and commenting. But so what, it was nonetheless worth it! 코코스웨디시
You have got what it takes good. link vào 18WIN
It is really a great and helpful piece of info. I am glad that you just shared this helpful information with us. Please stay us informed like this. Thanks for sharing. 전주스웨디시
Slots PK APK is a new amazing App in Pakistan that gives free space to all Gaming platforms, Slot, and Bingo game lovers to play their game of decision and win big rewards.
Nice post. I was checking constantly this blog and I’m impressed! Extremely useful info specially the last part I care for such information a lot. I was seeking this certain info for a long time. Thank you and good luck. 세종스웨디시
开云体育Kering’s commercial sponsorship system across Europe and the United States is enough to demonstrate Kering’s economic strength.
Whenever there is determination, there is a way. red88
Slots PK APK is a new amazing App in Pakistan that gives free space to all Gaming platforms, Slot, and Bingo game lovers to play their game of decision and win big rewards.
Most of the time I don’t make comments on websites, but I’d like to say that this article really forced me to do so. Really nice post! strawberry tabby of
Your article is a shining example of the power of well-crafted writing. trang cá cược bóng đá uy tín
This is a significant advancement in Arabic NLP, especially for empathetic conversational agents. Integrating models like AraBERT, AraGPT2, and AraELECTRA can significantly enhance customer service in Business & Finance. Such technology can improve engagement and efficiency in financial interactions, benefiting businesses and clients. Excited to see these models implemented in the industry!