We are pleased to announce that we have 3 papers accepted to The Sixth Arabic Natural Language Processing Workshop (WANLP 2021) co-located with EACL 2021. Authored by our talented team members: Tarek Naous, Wissam Antoun, Reem Mahmoud, Fady Baly under the supervision of Prof. Hazem Hajj. The papers target Arabic empathetic conversational agents, generative language models, and language understanding models.
Empathetic BERT2BERT Conversational Model:
Learning Arabic Language Generation with Little Data
Our latest contribution to Arabic Conversational AI leverages knowledge transfer from AraBERT in a BERT2BERT architecture. We address the low resource challenges and achieve sota results in open domain empathetic response generation.
Paper: https://arxiv.org/abs/2103.04353
Abstract: Enabling empathetic behavior in Arabic dialogue agents is an important aspect of building human-like conversational models. While Arabic Natural Language Processing has seen significant advances in Natural Language Understanding (NLU) with language models such as AraBERT, Natural Language Generation (NLG) remains a challenge. The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents. To overcome this issue, we propose a transformer-based encoder-decoder initialized with AraBERT parameters. By initializing the weights of the encoder and decoder with AraBERT pre-trained weights, our model was able to leverage knowledge transfer and boost performance in response generation. To enable empathy in our conversational model, we train it using the ArabicEmpatheticDialogues dataset and achieve high performance in empathetic response generation. Specifically, our model achieved a low perplexity value of 17.0 and an increase in 5 BLEU points compared to the previous state-of-the-art model. Also, our proposed model was rated highly by 85 human evaluators, validating its high capability in exhibiting empathy while generating relevant and fluent responses in open-domain settings.
AraGPT2:
Pre-Trained Transformer for Arabic Language Generation
AraGPT2 is a 1.5B transformer model, the largest for Arabic, trained on 77GB of text for 9 days with a TPUv3-128. The model can generate news articles that are difficult to distinguish from human-written articles. AraGPT2 shows impressive Zero-shot performance on trivia QA.
Paper: arxiv.org/abs/2012.15520
GitHub: https://github.com/aub-mind/arabert/tree/master/aragpt2
Abstract: Recently, pre-trained transformer-based architectures have proven to be very efficient at language modeling and understanding, given that they are trained on a large enough corpus. Applications in language generation for Arabic are still lagging in comparison to other NLP advances primarily due to the lack of advanced Arabic language generation models. In this paper, we develop the first advanced Arabic language generation model, AraGPT2, trained from scratch on a large Arabic corpus of internet text and news articles. Our largest model, AraGPT2-mega, has 1.46 billion parameters, which makes it the largest Arabic language model available. The Mega model was evaluated and showed success on different tasks including synthetic news generation, and zero-shot question answering. For text generation, our best model achieves a perplexity of 29.8 on held-out Wikipedia articles. A study conducted with human evaluators showed the significant success of AraGPT2-mega in generating news articles that are difficult to distinguish from articles written by humans. We thus develop and release an automatic discriminator model with a 98% percent accuracy in detecting model-generated text. The models are also publicly available, hoping to encourage new research directions and applications for Arabic NLP.
AraELECTRA:
Pre-Training Text Discriminators for Arabic Language Understanding
AraELECTRA is our latest advancements in Arabic Language Understanding. The model was trained on 77GB of Arabic text for 24 days. AraELECTRA achieves impressive performance, especially on Question Answering tasks.
Paper: https://arxiv.org/abs/2012.15516
Github: https://github.com/aub-mind/arabert/tree/master/araelectra
Abstract: Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Which, instead of training a model to recover masked tokens, it trains a discriminator model to distinguish true input tokens from corrupted tokens that were replaced by a generator network. On the other hand, current Arabic language representation approaches rely only on pretraining via masked language modeling. In this paper, we develop an Arabic language representation model, which we name AraELECTRA. Our model is pretrained using the replaced token detection objective on large Arabic text corpora. We evaluate our model on multiple Arabic NLP tasks, including reading comprehension, sentiment analysis, and named-entity recognition and we show that AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and with even a smaller model size.
Acknowledgments:
This research was supported by the University Research Board (URB) at the American University of Beirut (AUB), and by the TFRC program, which we thank for the free access to cloud TPUs. We also thank As-Safir newspaper for the data access.
What will be will be. nhà cái 79KING
A friend in need is a friend indeed. 123B
Nothing venture nothing gains. Kubett wtf
While there’s life, there’s hope. Game bài
I highly recommend SMM Panel King to anyone looking for a reliable smm panel. Their platform is efficient, and the results are visible in no time!
It is my first visit to your blog, and I am very impressed with the articles that you serve. Give adequate knowledge for me. Thank you for sharing useful material. I will be back for the more great post. gacor22
Beauty is in the eye of the beholder. kubet
Blood is thicker than water. bj38
The best results are achieved with high-quality images that clearly show the person you want to AI Undress. The uploaded photos should be well-lit, with minimal shadows and obstructions. The AI works best with full-body images of clothed individuals where the clothing is distinct and easy to identify.
This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more … good luck. Zonnepanelen met batterij
Regular visits listed here are the easiest method to appreciate your energy, which is why why I am going to the website everyday, searching for new, interesting info. Many, thank you! Spouwmuurisolatie
This post is good enough to make somebody understand this amazing thing, and I’m sure everyone will appreciate this interesting things. Zonnepanelen installateur
A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one. Zonnepanelen met thuisbatterij
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Warmtepompen
I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more. Thuisbatterij prijs
Gangaur Realtech is a professionally managed organisation specializing in real estate services where integrated services are provided by professionals to its clients seeking increased value by owning, occupying or investing in real estate. Thuisbatterij
This is very useful post for me. This will absolutely going to help me in my project. Zonnepanelen batterij
You actually make it look so easy with your performance but I find this matter to be actually something which I think I would never comprehend. It seems too complicated and extremely broad for me. I’m looking forward for your next post, I’ll try to get the hang of it! Chapes
Thumbs up guys your doing a really good job. Geothermische warmtepompen
What a really awesome post this is. Truly, one of the best posts I’ve ever witnessed to see in my whole life. Wow, just keep it up. Hybride warmtepomp
Thanks for the best blog. it was very useful for me.keep sharing such ideas in the future as well. Installateur panneaux solaires
This is a smart blog. I mean it. You have so much knowledge about this issue, and so much passion. You also know how to make people rally behind it, obviously from the responses. Warmtepomp verwarming
Going to graduate school was a positive decision for me. I enjoyed the coursework, the presentations, the fellow students, and the professors. And since my company reimbursed 100% of the tuition, the only cost that I had to pay on my own was for books and supplies. Otherwise, I received a free master’s degree. All that I had to invest was my time. Zonnepanelen
Nice Informative Blog having nice sharing.. home inspections cape town
If you sell your cow, you will sell her milk too. ww88
A good turn deserves another. vnsoxo
Wow, this is fascinating reading. I am glad I found this and got to read it. 슬롯사이트 순위
These papers showcase some significant strides in Arabic Natural Language Processing (NLP). The “Empathetic BERT2BERT Conversational Model” introduces an empathetic dialogue agent by adapting AraBERT, helping address the challenge of limited Arabic NLG resources. AraGPT2 stands out as the largest Arabic language model to date, pushing the boundaries of Arabic text generation with impressive performance in zero-shot tasks and news generation. Meanwhile, AraELECTRA’s innovation in Arabic language understanding demonstrates enhanced performance across key NLP tasks, using a more efficient pre-training method. These contributions are vital steps forward for Arabic NLP research!
Best Regards,
http://www.thebeachbuggyapk.com
Your blog provided us with valuable information to work with. Each & every tips of your post are awesome. Thanks a lot for sharing. Keep blogging, Complete Home Inspections
A man is known by the company he keeps. xo88
A good name is sooner lost than won. uk88
Great post I would like to thank you for the efforts you have made in writing this interesting and knowledgeable article. emergency dentist near me
Thank you for your post, I look for such article along time, today i find it finally. this post give me lots of advise it is very useful for me. Kosten zwembad aanleg
If you don”t mind proceed with this extraordinary work and I anticipate a greater amount of your magnificent blog entries Dakwerken
I can see that you are an expert at your field! I am launching a website soon, and your information will be very useful for me.. Thanks for all your help and wishing you all the success in your business. Zwembad plaatsen prijs
It’s late finding this act. At least, it’s a thing to be familiar with that there are such events exist. I agree with your Blog and I will be back to inspect it more in the future so please keep up your act. Zonnepanelen Leuven
The writer is enthusiastic about purchasing wooden furniture on the web and his exploration about best wooden furniture has brought about the arrangement of this article. Luxe zwembad
I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more. Zonnepanelen Hasselt
Gangaur Realtech is a professionally managed organisation specializing in real estate services where integrated services are provided by professionals to its clients seeking increased value by owning, occupying or investing in real estate. Zwembaden polypropyleen
I have read your article; it is very informative and helpful for me. I admire the valuable information you offer in your articles. Thanks for posting it. Zonnepanelen Antwerpen
Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites! Zwembaden Antwerpen
Glad to chat your blog, I seem to be forward to more reliable articles and I think we all wish to thank so many good articles, blog to share with us. Zonnepanelen Limburg
Thumbs up guys your doing a really good job. Zwembadbouwer Vlaams-Brabant
What a really awesome post this is. Truly, one of the best posts I’ve ever witnessed to see in my whole life. Wow, just keep it up. Zonnepanelen prijs
Thanks for the best blog. it was very useful for me.keep sharing such ideas in the future as well. Zwembadbouwer Oost-Vlaanderen
Going to graduate school was a positive decision for me. I enjoyed the coursework, the presentations, the fellow students, and the professors. And since my company reimbursed 100% of the tuition, the only cost that I had to pay on my own was for books and supplies. Otherwise, I received a free master’s degree. All that I had to invest was my time. Zwembadbouwer Limburg
I was examining some of your blog posts on this site and I believe this website is very instructive! Continue posting. สมัคร เว็บ หวย ลาว
A good face is a letter of recommendation. Jun88
A good beginning makes a good ending. abc8
Thanks for sharing. I found a lot of interesting information here. A really good post, very thankful and hopeful that you will write many more posts like this one my spotify pie