We are pleased to announce that we have 3 papers accepted to The Sixth Arabic Natural Language Processing Workshop (WANLP 2021) co-located with EACL 2021. Authored by our talented team members: Tarek Naous, Wissam Antoun, Reem Mahmoud, Fady Baly under the supervision of Prof. Hazem Hajj. The papers target Arabic empathetic conversational agents, generative language models, and language understanding models.
Empathetic BERT2BERT Conversational Model:
Learning Arabic Language Generation with Little Data
Our latest contribution to Arabic Conversational AI leverages knowledge transfer from AraBERT in a BERT2BERT architecture. We address the low resource challenges and achieve sota results in open domain empathetic response generation.
Paper: https://arxiv.org/abs/2103.04353
Abstract: Enabling empathetic behavior in Arabic dialogue agents is an important aspect of building human-like conversational models. While Arabic Natural Language Processing has seen significant advances in Natural Language Understanding (NLU) with language models such as AraBERT, Natural Language Generation (NLG) remains a challenge. The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents. To overcome this issue, we propose a transformer-based encoder-decoder initialized with AraBERT parameters. By initializing the weights of the encoder and decoder with AraBERT pre-trained weights, our model was able to leverage knowledge transfer and boost performance in response generation. To enable empathy in our conversational model, we train it using the ArabicEmpatheticDialogues dataset and achieve high performance in empathetic response generation. Specifically, our model achieved a low perplexity value of 17.0 and an increase in 5 BLEU points compared to the previous state-of-the-art model. Also, our proposed model was rated highly by 85 human evaluators, validating its high capability in exhibiting empathy while generating relevant and fluent responses in open-domain settings.
AraGPT2:
Pre-Trained Transformer for Arabic Language Generation
AraGPT2 is a 1.5B transformer model, the largest for Arabic, trained on 77GB of text for 9 days with a TPUv3-128. The model can generate news articles that are difficult to distinguish from human-written articles. AraGPT2 shows impressive Zero-shot performance on trivia QA.
Paper: arxiv.org/abs/2012.15520
GitHub: https://github.com/aub-mind/arabert/tree/master/aragpt2
Abstract: Recently, pre-trained transformer-based architectures have proven to be very efficient at language modeling and understanding, given that they are trained on a large enough corpus. Applications in language generation for Arabic are still lagging in comparison to other NLP advances primarily due to the lack of advanced Arabic language generation models. In this paper, we develop the first advanced Arabic language generation model, AraGPT2, trained from scratch on a large Arabic corpus of internet text and news articles. Our largest model, AraGPT2-mega, has 1.46 billion parameters, which makes it the largest Arabic language model available. The Mega model was evaluated and showed success on different tasks including synthetic news generation, and zero-shot question answering. For text generation, our best model achieves a perplexity of 29.8 on held-out Wikipedia articles. A study conducted with human evaluators showed the significant success of AraGPT2-mega in generating news articles that are difficult to distinguish from articles written by humans. We thus develop and release an automatic discriminator model with a 98% percent accuracy in detecting model-generated text. The models are also publicly available, hoping to encourage new research directions and applications for Arabic NLP.
AraELECTRA:
Pre-Training Text Discriminators for Arabic Language Understanding
AraELECTRA is our latest advancements in Arabic Language Understanding. The model was trained on 77GB of Arabic text for 24 days. AraELECTRA achieves impressive performance, especially on Question Answering tasks.
Paper: https://arxiv.org/abs/2012.15516
Github: https://github.com/aub-mind/arabert/tree/master/araelectra
Abstract: Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Which, instead of training a model to recover masked tokens, it trains a discriminator model to distinguish true input tokens from corrupted tokens that were replaced by a generator network. On the other hand, current Arabic language representation approaches rely only on pretraining via masked language modeling. In this paper, we develop an Arabic language representation model, which we name AraELECTRA. Our model is pretrained using the replaced token detection objective on large Arabic text corpora. We evaluate our model on multiple Arabic NLP tasks, including reading comprehension, sentiment analysis, and named-entity recognition and we show that AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and with even a smaller model size.
Acknowledgments:
This research was supported by the University Research Board (URB) at the American University of Beirut (AUB), and by the TFRC program, which we thank for the free access to cloud TPUs. We also thank As-Safir newspaper for the data access.
This is very educational content and written well for a change. It’s nice to see that some people still understand how to write a quality post! 羅氏鮮
This is such a great resource that you are providing and you give it away for free. I love seeing websites that understand the value of providing a quality resource for free. It is the old what goes around comes around routine.오피아트
Great job on this informative post! It’s refreshing to find content that’s both educational and well-written. Thanks gepco online bill
Nice work on the informative post! It’s good to find something that teaches well and is easy to read. Thanks! Mepco Bill
thanks this is good blog. gocengqq
thank you for a great post. empresa de reformas
I found your this post while searching for some related information on blog search…Its a good post..keep posting and update the information. treffiseuraa
Closeout surplus buyers are businesses that purchase excess inventory at discounted rates. We Buy Dead Stocks is a reputable closeout surplus buyer, offering fair prices for a wide range of products.
Thanks for a wonderful share. Your article has proved your hard work and experience you have got in this field. Brilliant .i love it reading. camel crown brand
I found your this post while searching for some related information on blog search…Its a good post..keep posting and update the information. organic modern paint colors
The empathetic conversational model’s integration of AraBERT for response generation is impressive. Reminds me of finding the perfect blend of ingredients for a nourishing
dish at Nutritional World.
I am hoping the same best effort from you in the future as well. In fact your creative writing skills has inspired me. 인계동오피
Thank you for taking the time to publish this information very useful! Santa Clara County Medical Malpractice Attorneys
Your website is really cool and this is a great inspiring article. Thank you so much. Vancouver, WA Medical Malpractice Lawyers – Hospital Negligence Experts
Nice Informative Blog having nice sharing.. Medical Negligence Compensation Bothell, Washington
nice but I am offering modified version of capcut for free. Download Now and unlock all paid things for free
Hey what a brilliant post I have come across and believe me I have been searching out for this similar kind of post for past a week and hardly came across this. Thank you very much and will look for more postings from you. sell inherited land fast in Connecticut
Hey what a brilliant post I have come across and believe me I have been searching out for this similar kind of post for past a week and hardly came across this. Thank you very much and will look for more postings from you. เว็บไก่ชน
Thanks for sharing this information. I really like your blog post very much. You have really shared a informative and interesting blog post with people.. link alexistogel
I admire what you have done here. I like the part where you say you are doing this to give back but I would assume by all the comments that this is working for you as well. tubidy
I have read your article, it is very informative and helpful for me.I admire the valuable information you offer in your articles. Thanks for posting it.. alexistogel
I really appreciate the kind of topics you post here. Thanks for sharing us a great information that is actually helpful. Good day! situs togel
This is very educational content and written well for a change. It’s nice to see that some people still understand how to write a quality post! exototo slot
I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often. exototo slot
I really loved reading your blog. It was very well authored and easy to undertand. Unlike additional blogs I have read which are really not tht good. I also found your posts very interesting. In fact after reading, I had to go show it to my friend and he ejoyed it as well! alexistogel
This is very educational content and written well for a change. It’s nice to see that some people still understand how to write a quality post.! alexistogel
I love the way you write and share your niche!
토토사이트 순위
This is the masterpiece I have ever read.
카지노사이트 모음
i read a lot of stuff and i found that the way of writing to clearifing that exactly want to say was very good so i am impressed and ilike to come again in future.. alexistogel
I really appreciate the kind of topics you post here. Thanks for sharing us a great information that is actually helpful. Good day! alexistogel
Love to read it,Waiting For More new Update and I Already Read your Recent Post its Great Thanks. mistakes to avoid when buying land in Georgia
I am really thankful to you for sharing such useful info.
Hope you are sharing the same in future.
thanks 실시간 바카라사이트
Great writing to see, glad that google brought me here, Keep Up cool job 슬롯사이트
Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic. If possible, as you gain expertise, would you mind updating your blog with more information? It is extremely helpful for me. hoki108
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work! hoki108
This is very educational content and written well for a change. It’s nice to see that some people still understand how to write a quality post.! hoki108
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work! alexistogel
Thank you so much for the post you do. I like your post and all you share with us is up to date and quite informative, i would like to bookmark the page so i can come here again to read you, as you have done a wonderful job. alexistogel
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work! alexistogel
This type of message always inspiring and I prefer to read quality content, so happy to find good place to many here in the post, the writing is just great, thanks for the post. alexistogel login
I was surfing net and fortunately came across this site and found very interesting stuff here. Its really fun to read. I enjoyed a lot. Thanks for sharing this wonderful information. alexistogel login
Great! It sounds good. Thanks for sharing.. digital marketing insights
Your dedication to delivering high-quality content is evident in every word you write.
trang web cá cược bóng đá hợp pháp
That is really nice to hear. thank you for the update and good luck. what to do after inheriting land in Illinois
Sangoma Maroela is a renowned traditional healer who has dedicated his life to assisting those facing various challenges. With a deep understanding of ancient African healing practices and a compassionate heart, He offers personalized guidance and remedies to individuals seeking solutions to their problems. Sangoma Maroela’s expertise extends to a wide range of issues, including relationship difficulties, financial setbacks, health concerns, and spiritual imbalances. He combines traditional Herbal remedies, divination techniques, and spiritual rituals to create customized treatments that address the root causes of each individual’s struggles. Through his empathetic approach and holistic approach to healing, Sangoma Maroela empowers his clients to overcome obstacles, restore harmony in their lives, and achieve lasting well-being. His unwavering commitment to helping those in need has earned his a reputation as a trusted and effective guide on the path to personal growth and fulfillment. https://traditionalhealers-sangoma.co.za/
Innovation distinguishes between a leader and a follower. nhà cái uy tín
The more that you read, the more things you will know. The more that you learn, the more places you’ll go. new 88
“What’s up colleagues, nice article and nice urging commented at this place, I am truly enjoying by these. 스포츠토토사이트
“Everything is very open with a very clear explanation. It was definitely informative. Your site is very useful.
Thank you for sharing! 메이저토토사이트
Thank you for sharing your info. I truly appreciate your efforts and I will be waiting for your further post thanks once again. 바카라사이트추천