Tools & Resources
Tool ID | Reference ID | Tool Name | Descriptive keywords | Purpose | Category | sub-category | Languages | Link |
---|---|---|---|---|---|---|---|---|
1_1 | 1 | Tagger(POS) | Part of speech tagging | POS Tagging | Basic Language Analysis | Syntax Parsing | Egyptian, Gulf, Maghrebi, Levantine | https://github.com/qcri/dialectal_arabic_pos_tagger |
1_2 | 1 | Orthography Guideline | Conventional orthography | Orthographic Consistency | Basic Language Analysis | Orthographic analysis | Arabic Dialects | https://github.com/CAMeL-Lab/camel-guidelines/blob/master/docs/orthography.md |
1_3 | 1 | ADAM | Morphological analyzer | Morphological Analysis | Basic Language Analysis | Morphological Analysis | Levantine, Egyptian | https://github.com/WaelSalloum/adam |
1_4 | 1 | CALIMA STAR | Morphological analyzer | Morphological Analysis | Basic Language Analysis | Morphological Analysis | Gulf | https://calimastar.abudhabi.nyu.edu/analyzer/ |
1_5 | 1 | Tunisian Arabic morphological analyzer evaluation corpus | Morphological Analyzer evaluation corpus | Morphological Analysis | Lexical Resources | Corpus | Tunisian | https://github.com/NadiaBMKarmani/Intelligent-Tunisian-Arabic-Morphological-Analyzer-evaluation-corpus |
1_6 | 1 | morphological,pos annotation | Morphological analyzer Corpus | Morphological Analysis | Lexical Resources | Corpus | Gulf | https://camel.abudhabi.nyu.edu/gumar/ |
1_7 | 1 | morphological disambiguation | Morphological analyzer | Morphological Analysis | Basic Language Analysis | Morphological Analysis | Egyptian, MSA | https://camel.abudhabi.nyu.edu/madamira/ |
1_8 | 1 | Automatic Arabic Dialect Detection Task | dialect identification | Language Identification | Language Identification | Language Identification | MSA, Levantine, North African, Egyptian | https://github.com/qcri/dialectID |
1_9 | 1 | Arabic Corpora | Habibi, Kalimat and others | Multipurpose | Lexical Resources | Database | Multi | http://www.lancaster.ac.uk/staff/elhaj/corpora.htm |
1_10 | 1 | Detecting Arabic Dialects | dialect identification | Language Identification | Language Identification | Language Identification | Arabic Dialects, MSA | https://github.com/drelhaj/ArabicDialects |
1_11 | 1 | End-to-end Dialect Identification | dialect identification | Language Identification | Language Identification | Language Identification | MSA, Levantine, North African, Egyptian | https://github.com/swshon/dialectID_e2e |
1_12 | 1 | DarijaBERT | Moroccan Model, Language identification, sentiment analysis | Language Identification | Language Identification | Language Identification | Maghrebi,MSA | https://github.com/AIOXLABS/DBert |
1_13 | 1 | arabic-dialect-identification | Arabic dialect speech language identification | Language Identification | Language Identification | Language Identification | Arabic Dialects | https://github.com/swshon/arabic-dialect-identification |
1_14 | 1 | BOLT Egyptian Arabic Treebank - Discussion Forum | Multipurpose | Multipurpose | Lexical Resources | Treebank | Egyptian | https://catalog.ldc.upenn.edu/LDC2018T23 |
1_15 | 1 | Masader | online catalogue for Arabic NLP datasets | Multipurpose | Lexical Resources | Database | Arabic Dialects | https://github.com/ARBML/masader |
1_16 | 1 | Saudi corpus - SDC | Saudi dialect corpus | Multipurpose | Lexical Resources | Corpus | Saudi | https://github.com/TaghreedT/SDC |
1_17 | 1 | PADIC | Parallel arabic dialect, Dialect translation | Translation | Lexical Resources | Corpus | Arabic Dialects | http://smart.loria.fr/pmwiki/pmwiki.php/PmWiki/Corpora |
1_18 | 1 | WikiDocsAligner | Lexical resources | Multipurpose | Lexical Resources | Corpus | Egyptian,MSA | https://github.com/motazsaad/WikiDocsAligner |
1_19 | 1 | Comparable corpus | Lexical resources | Multipurpose | Lexical Resources | Corpus | Egyptian,MSA | https://github.com/motazsaad/comparableWikiCoprus |
1_20 | 1 | Madar parallel corpus and lexicon | Lexical resources | Translation | Lexical Resources | Corpus | Multi | https://camel.abudhabi.nyu.edu/madar-parallel-corpus/? |
1_21 | 1 | DART corpus | Lexical resources | Multipurpose | Lexical Resources | Corpus | Gulf, Iraqi, Levantine, Maghrebi, Egyptian | https://www.dropbox.com/s/jslg6fzxeu47flu/DART.zip?dl=0 |
1_22 | 1 | Arabic Multidialectal Word Embeddings | word embeddings learned from different dialects of Arabic | Feature Engineering | Feature engineering | Word embeddings | Arabic Dialects | https://camel.abudhabi.nyu.edu/arabic-multidialectal-embeddings/ |
1_23 | 1 | darija-dictionary | Dictionary | Translation | Lexical Resources | Dictionary | Maghrebi,English | https://github.com/darija-open-dataset/dataset |
1_24 | 1 | Comparable Algerian corpus | Algerian corpus | Multipurpose | Lexical Resources | Corpus | Algerian, MSA, French, English | https://github.com/abidikarima/CALYOU |
1_25 | 1 | Corpus | Palestinian corpus | Multipurpose | Lexical Resources | Corpus | Palestinian | http://portal.sina.birzeit.edu/curras/download.html |
1_26 | 1 | ArabicWeb16 | Arabic Web collection | Multipurpose | Lexical Resources | Corpus | Arabic Dialects, MSA | http://qufaculty.qu.edu.qa/telsayed/arabicweb16/ |
1_27 | 1 | Extraction tweet code | Sentiment analysis on arabic tweets | Sentiment Analysis | Semantic Analysis | Text classification | Arabic Dialects, MSA | https://github.com/alazraq/arabic-nlp |
1_28 | 1 | super-parallal corpora | Parallel corpus | Translation | Lexical Resources | Corpus | Multi | https://github.com/ehsanasgari/1000Langs |
1_29 | 1 | Tunisian Sentiment Analysis Corpus - TSAC | Sentiment Analysis Corpus | Sentiment Analysis | Lexical Resources | Corpus | Tunisian | https://github.com/fbougares/TSAC |
1_30 | 1 | DzSenti | code for sentiment analysis, corpus | Sentiment Analysis | Semantic Analysis | Text classification | Algerian | https://github.com/adelabdelli/DzSentiA |
1_31 | 1 | Sentiment-analysis-of-riyadh-season-events | Sentiment Analysis | Sentiment Analysis | Lexical Resources | Corpus | Saudi | https://github.com/Yasalm/Sentiment-analysis-of-riyadh-season-events |
1_32 | 1 | oeadalg | Sentiment analysis | Sentiment Analysis | Lexical Resources | Corpus | Algerian | https://github.com/kinmokusu/oea_algd |
1_33 | 1 | ARLSTem | Stemmer algorithm | Morphological Analysis | Basic Language Analysis | Morphological Analysis | MSA | https://github.com/xprogramer/Arabic-Stemmers/blob/master/ARLSTem.py |
1_34 | 1 | YAMAMA | Dialect Arabic Morphological Analyzer | Morphological Analysis | Basic Language Analysis | Morphological Analysis | Arabic Dialects | https://nyuad.nyu.edu/en/research/centers-labs-and-projects/computational-approaches-to-modeling-language-lab/resources.html |
1_35 | 1 | Parser | Arabic Syntactic Analysis and Morphological Disambiguation | Multipurpose | Basic Language Analysis | Syntax Parsing | MSA | https://camel.abudhabi.nyu.edu/camelparser/ |
1_36 | 1 | NUDAR | treebank of texts annotated in the Universal Dependency syntactic representation. | Syntactic Analysis | Lexical Resources | Treebank | MSA | https://nyuad.nyu.edu/en/research/faculty-labs-and-projects/computational-approaches-to-modeling-language-lab/research/arabic-natural-language-processing.html |
1_37 | 1 | CONLL-UL7 | Universal Morphological Lattices for Universal Dependency Parsing | Syntactic Analysis | Basic Language Analysis | Morphological Analysis | MSA | https://github.com/conllul/conllul.github.io |
1_38 | 1 | Arabic News Article Classification | corpus, text classification | Text Classification | Semantic Analysis | Text classification | MSA | https://github.com/saidziani/Arabic-News-Article-Classification |
1_39 | 1 | Sinai Corpus | tagged sentences | Multipurpose | Lexical Resources | Corpus | MSA | https://github.com/mohabmes/Sinai-corpus |
1_40 | 1 | Prague Arabic Dependency Treebank | multi-level linguistic annotations over the language of Modern Standard Arabic | Multipurpose | Lexical Resources | Treebank | MSA | https://ufal.mff.cuni.cz/padt/PADT_1.0/docs/index.html |
1_41 | 1 | United Nations Parallel Corpus | six-language parallel corpus, Machine Translation | Translation | Lexical Resources | Corpus | Multi | https://conferences.unite.un.org/UNCorpus |
1_42 | 1 | Parallelcorpus(60languageincludingMSA)11 | 65 languages | Translation | Lexical Resources | Corpus | Multi | http://opus.nlpl.eu/OpenSubtitles2016.php |
1_43 | 1 | TUFS Media Corpus | Parallel corpus | Translation | Lexical Resources | Corpus | Multi | http://el.tufs.ac.jp/tufsmedia-corpus/ |
1_44 | 1 | Sentiment corpus | Labeled corpus | Sentiment Analysis | Lexical Resources | Corpus | MSA | https://github.com/iamaziz/ar-embeddings/tree/master/datasets |
1_45 | 1 | LABR: A Large-SCale Arabic Book Reviews Dataset | Large-SCale Arabic Book Reviews Dataset | Multipurpose | Lexical Resources | Corpus | MSA | https://github.com/mahmoudnabil/labr |
1_46 | 1 | Character-Aware Neural Language Models | Character-Aware Neural Language Models | Language Modeling | Language Modeling | Language Model | Multi | https://github.com/yoonkim/lstm-char-cnn |
1_47 | 1 | Quranic corpus | Classical Arabic Coprus | Multipurpose | Lexical Resources | corpus | CA | http://corpus.quran.com/ |
1_48 | 1 | Shamela corpus | Online Library | Multipurpose | Lexical Resources | Database | CA, MSA | https://shamela.ws/ |
1_49 | 1 | Quranic corpus | corpus | Multipurpose | Lexical Resources | Corpus | CA | http://textminingthequran.com/ |
1_50 | 1 | QuranAnalysis (QA) Project | Semantic Search and Intelligence System for the Quran | Information retrieval | Semantic Analysis | Information retrieval | CA | https://github.com/karimouda/qurananalysis |
1_51 | 1 | Translation of Quran | corpus | Translation | Lexical Resources | Corpus | CA | http://tanzil.net/trans/ |
3_1 | 3 | Arabic emoticon lexicon | Sentiment analysis lexicon | Sentiment Analysis | Lexical Resources | lexicon | multi | https://www.saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm |
3_2 | 3 | ArSenL | Sentiment analysis lexicon | Sentiment Analysis | Lexical Resources | lexicon | MSA | http://oma-project.com/ArSenL/download_intro |
3_3 | 3 | NileULex | Sentiment analysis lexicon | Sentiment Analysis | Lexical Resources | lexicon | Egyptian, MSA | https://github.com/NileTMRG/NileULex |
3_4 | 3 | ASTD + python code | Sentiment analysis dataset | Sentiment Analysis | Lexical Resources | corpus | MSA | https://github.com/mahmoudnabil/ASTD |
3_5 | 3 | Large Multi-Domain Resources for Arabic Sentiment Analysis | Sentiment analysis dataset | Sentiment Analysis | Lexical Resources | Database | MSA | https://github.com/hadyelsahar/large-arabic-sentiment-analysis-resouces |
3_6 | 3 | ArTwitter | Sentiment analysis dataset | Sentiment Analysis | Lexical Resources | Corpus | MSA, Jordanian | https://archive.ics.uci.edu/ml/datasets/Twitter+Data+set+for+Arabic+Sentiment+Analysis |
4_1 | 4 | Tim Buckwalter’s morphological analyzer | Morphological analyzer | Morphological Analysis | Basic Language Analysis | Morphological Analysis | MSA | http://www.qamus.org |
4_2 | 4 | Sarf engine | engine that can generate Arabic verbs, nouns, gerunds, adjectives from their roots. | Morphological Analysis | Basic Language Analysis | Morphological Analysis | MSA | http://sourceforge.net/projects/sarf |
5_1 | 5 | Al-Hayat Arabic Corpus | for information retrieval purposes | Information retrieval | Lexical Resources | Corpus | MSA | http://catalog.elra.info/product_info.php?products_id=632 |
5_2 | 5 | Amara | open multilingual collection of subtitles for educational videos | Translation | Lexical Resources | Corpus | Multi | http://alt.qcri.org/resources/qedcorpus/ |
5_3 | 5 | Arab-Acquis | machine translation between arabic and 22 european countries | Translation | Lexical Resources | Corpus | Multi | https://camel.abudhabi.nyu.edu/arabacquis/ |
5_4 | 5 | Arabic English Parallel News | parallel corpus, Machine translation | Translation | Lexical Resources | Corpus | MSA,English | https://catalog.ldc.upenn.edu/LDC2004T18 |
5_5 | 5 | Arabic Treebank | automatic content extraction, cross-lingual information retrieval, information detection | Multipurpose | Lexical Resources | Treebank | MSA | https://catalog.ldc.upenn.edu/LDC2005T20 |
5_6 | 5 | Aranea Web Corpora | corpora | Multipurpose | Lexical Resources | Database | Multi | http://unesco.uniba.sk/guest/ |
5_7 | 5 | arTenTen | pos, part of speech tagging | Multipurpose | Lexical Resources | Corpus | Multi | https://www.sketchengine.co.uk/artenten-corpus/ |
5_8 | 5 | KSUCCA | 50 million tokens annotated corpus of Classical Arabic | Multipurpose | Lexical Resources | Corpus | CA | https://sourceforge.net/projects/ksucca-corpus/ |
5_9 | 5 | Multilingual Multi-Document Summarization Corpus | summarization corpus | Text Summarization | Lexical Resources | Corpus | Multi | http://multiling.iit.demokritos.gr/pages/view/1540/task-mms-multi-document-summarization-data-and-information |
5_10 | 5 | NEMLAR Corpus | Arabic text from 13 different domains | Multipurpose | Lexical Resources | Corpus | MSA | https://old.linguistlist.org/issues/17/17-2368.html |
5_11 | 5 | OntoNotes | Multilingual Copora | Multipurpose | Lexical Resources | Corpus | Multi | https://catalog.ldc.upenn.edu/LDC2013T19 |
5_12 | 5 | The International Corpus of Arabic | corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://www.bibalex.org/ica |
5_13 | 5 | WIT3 | Web Inventory of Transcribed and Translated Talks | Multipurpose | Lexical Resources | Database | Multi | https://wit3.fbk.eu/ |
6_1 | 6 | Ajdir Corpora | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://aracorpus.e3rab.com/argistestsrv.nmsu.edu/AraCorpus/ |
6_2 | 6 | OSAC | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/ |
6_3 | 6 | Alwatan | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://sourceforge.net/projects/arabiccorpus/ |
6_4 | 6 | Tashkeela | Monolingual Corpora | Diacritization | Lexical Resources | Corpus | MSA | http://sourceforge.net/projects/tashkeela/ |
6_5 | 6 | Al Khaleej | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://sourceforge.net/projects/arabiccorpus/ |
6_6 | 6 | KACST Arabic Newspaper Corpus | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://sourceforge.net/projects/kacst-acptool/files/?source=navbar |
6_7 | 6 | Arabic Words Corpora | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://sourceforge.net/projects/arabicwordcorpu/files/ |
6_8 | 6 | Corpus of Contemporary Arabic | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://shachi.org/resources/4051 |
6_9 | 6 | CRI KACST Arabic Corpus | Monolingual Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://cri.kacst.edu.sa/Resources/TRN_DB.rar |
6_10 | 6 | Arabic Learners Written Corpus | Monolingual Corpora | Multipurpose | Lexical Resources | Database | MSA | https://cercll.arizona.edu/arabic-corpus/ |
6_11 | 6 | MEEDAN Translation Memory | Multilingual Copora | Translation | Lexical Resources | Corpus | MSA, English | https://github.com/anastaw/Meedan-Memory |
6_12 | 6 | Tunisian Dialect Corpus (TuDiCoI) | Dialectal Corpora | Multipurpose | Lexical Resources | Corpus | Tunisian | https://sites.google.com/site/anlprg/outils-et-corpus-realises/TuDiCoIV1.xml?attredirects=0 |
6_13 | 6 | KACST Arabic Corpus | Web based Corpora | Multipurpose | Lexical Resources | Database | MSA | https://corpus.kacst.edu.sa/ |
6_14 | 6 | Leeds Arabic Internet Corpus | Web based Corpora | Multipurpose | Lexical Resources | Database | Multi | http://corpus.leeds.ac.uk/query-ar.html |
6_15 | 6 | ArabiCorpus | Web based Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://arabicorpus.byu.edu/ |
6_16 | 6 | QURANY | Web based Corpora | Multipurpose | Lexical Resources | Corpus | CA | https://corpus.quran.com/ |
6_17 | 6 | ANERCorp | Named Entity Corpora | Named Entity Recognition | Lexical Resources | Corpus | MSA | http://curtis.ml.cmu.edu/w/courses/index.php/ANERcorp |
6_18 | 6 | AQMAR Named Entity Corpus | Named Entity Corpora | Named Entity Recognition | Lexical Resources | Corpus | MSA | http://www.ark.cs.cmu.edu/ArabicNER/ |
6_19 | 6 | Named Entity Translation Lexicon | Named Entity Corpora | Named Entity Recognition | Lexical Resources | Corpus | MSA | http://nlp.qatar.cmu.edu/resources/NETLexicon/ |
6_20 | 6 | Named Entities List | Named Entity Corpora | Named Entity Recognition | Lexical Resources | Gazetteer | MSA | https://sourceforge.net/projects/arabicnes/ |
6_21 | 6 | ANERGazet | Named Entity Corpora | Named Entity Recognition | Lexical Resources | Gazetteer | MSA | http://curtis.ml.cmu.edu/w/courses/index.php/ANERgazet |
6_22 | 6 | Qatar Arabic language Bank(QALB) | Error-Annotated Corpora | Orthographic Consistency | Lexical Resources | Corpus | MSA | http://nlp.qatar.cmu.edu/qalb/sharedtask/shared_task.html |
6_23 | 6 | Arabic Learner Corpus | Error-Annotated Corpora | Orthographic Consistency | Lexical Resources | Corpus | MSA, Saudi | https://www.arabiclearnercorpus.com/ |
6_24 | 6 | The Quranic Arabic Corpus | Miscellaneous Annotated Corpora | Multipurpose | Lexical Resources | Corpus | CA | http://corpus.quran.com/download/ |
6_25 | 6 | AQMAR Arabic Wiki. Supersense Corpus | Miscellaneous Annotated Corpora | Nominal Supersenses | Lexical Resources | Corpus | MSA | http://www.ark.cs.cmu.edu/ArabicSST/ |
6_26 | 6 | Khoja POS tagged corpus | Miscellaneous Annotated Corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://zeus.cs.pacificu.edu/shereen/research.htm#corpora |
6_27 | 6 | Arabic Wikipedia Dependency Corpus | Miscellaneous Annotated Corpora, syntax dependency | Multipurpose | Lexical Resources | Corpus | MSA | http://www.ark.cs.cmu.edu/ArabicDeps/ |
6_28 | 6 | BAMA 1.0 English-Arabic Lexicon | Lexical Databases List, POS tagging dataset | Multipurpose | Lexical Resources | Lexicon | Multi | http://catalog.ldc.upenn.edu/LDC2002L49 |
6_29 | 6 | Arabic-English Learner's Dictionary | Lexical Databases List. | Multipurpose | Lexical Resources | dictionary | Multi | http://www.perseus.tufts.edu/hopper/opensource/download |
6_30 | 6 | Unitex Arabic Package | Lexical Databases List. | Multipurpose | Lexical Resources | Corpus | MSA | http://www-igm.univ-mlv.fr/~unitex/index.php?page=3&htm |
6_31 | 6 | ARALEX Online | Lexical Databases List. | Multipurpose | Lexical Resources | Lexicon | MSA | https://aralex.mrc-cbu.cam.ac.uk/aralex.online/login.jsp |
6_32 | 6 | AraComLex Arabic Lexical Database | Lexical Databases List. | Morphological Analysis | Lexical Resources | Database | MSA | http://sourceforge.net/projects/aracomlex/files/ |
6_33 | 6 | Arabic VerbNEt | Lexical Databases List. | Multipurpose | Lexical Resources | VerbNEt | MSA | https://github.com/JaouadMousser/Arabic-Verbnet |
6_34 | 6 | Arabic WordNEt | Lexical Databases List. | Multipurpose | Lexical Resources | WordNEt | MSA, English | http://sourceforge.net/projects/awnbrowser/ |
6_35 | 6 | NOOJ Arabic Dictionary | Lexical Databases List. | Multipurpose | Lexical Resources | Database | MSA | https://site-nooj.blogspot.com/p/arabic-tutorials.html |
6_36 | 6 | Word Count of Modern Standard Arabic | List of Words Lists | Multipurpose | Lexical Resources | List | MSA | http://arabicwordcount.sourceforge.net/ |
6_37 | 6 | Arabic Wordlist for Spellchecking | List of Words Lists | Orthographic Consistency | Lexical Resources | List | MSA | http://sourceforge.net/projects/arabic-wordlist/ |
6_38 | 6 | Multiword Expressions | List of Words Lists | Preprocessing | Lexical Resources | List | MSA | https://sourceforge.net/projects/arabicmwes/ |
6_39 | 6 | Arabic Unknown Words | List of Words Lists | Preprocessing | Lexical Resources | List | MSA | http://arabic-unknowns.sourceforge.net/ |
6_40 | 6 | Arabic Stop words | List of Words Lists | Preprocessing | Lexical Resources | List | MSA | http://sourceforge.net/projects/arabicstopwords/ |
6_41 | 6 | Obsolete Arabic Words | List of Words Lists | Preprocessing | Lexical Resources | List | MSA | http://obsoletearabic.sourceforge.net/ |
6_42 | 6 | Arabic Broken Plurals | List of Words Lists | Multipurpose | Lexical Resources | List | MSA | http://broken-plurals.sourceforge.net/ |
6_43 | 6 | AFEWC and Enews Comparable Corpora | Miscellaneous Corpora Types | Multipurpose | Lexical Resources | Corpus | MSA, French, English | http://sourceforge.net/projects/crlcl/ |
6_44 | 6 | InAra (a corpus for Arabic Intrinsic plagiarism detection evaluation) | plagiarism detection corpus | Content Moderation | Lexical Resources | Corpus | MSA | https://sourceforge.net/projects/inaracorpus/ |
6_45 | 6 | Essex Arabic Summaries Corpus | Miscellaneous Corpora Types | Text Summarization | Lexical Resources | Corpus | MSA | http://sourceforge.net/projects/easc-corpus/ |
6_46 | 6 | KALIMAT Multi-Purpose Corpus | Miscellaneous Corpora Types | Multipurpose | Lexical Resources | Corpus | MSA | http://sourceforge.net/projects/kalimat/ |
8_1 | 8 | Arabic text recognition competition of ICDAR 2013 | text recognition | Computer Vision | Lexical Resources | Corpus | MSA | https://diuf.unifr.ch/main/diva/APTI/download.html |
8_2 | 8 | Arabic handwritten ancient manuscripts called AVICENNA. | Handwriting Recognition Corpora | Computer Vision | Lexical Resources | Corpus | CA | http://www.causality.inf.ethz.ch/ul_data/AVICENNA.html |
8_3 | 8 | the IIIT Arabic scene text dataset | Recognizing arabic text in videos | Computer Vision | Lexical Resources | Corpus | MSA | https://cvit.iiit.ac.in/research/projects/cvit-projects/arabic-text-recognition |
8_4 | 8 | tessertact | ocr | Computer Vision | NLP Toolkit | NLP Toolkit | MSA | https://github.com/tesseract-ocr/tesseract |
8_5 | 8 | Script identification | Scene Text Script Identification | Computer Vision | NLP Toolkit | NLP Toolkit | Multi | https://github.com/lluisgomez/script_identification |
8_6 | 8 | Video Script Identification Competition (CVSI-2015) dataset | Video script | Computer Vision | Lexical Resources | Corpus | Multi | http://www.ict.griffith.edu.au/cvsi2015/. |
8_7 | 8 | 2016 Arabic multi-genre broadcast (MGB) challenge | audio, Multipurpose lexical resources | Multipurpose | Lexical Resources | Corpus | MSA | http://www.mgb-challenge.org/before20190909/arabic_download.html |
8_8 | 8 | character-level NN for the Arabic dialects identification task of the DSL challenge | Dialect identification | Language Identification | Language Identification | Language Identification | Multi | https://github.com/boknilev/dsl-char-cnn |
8_9 | 8 | dialect datasets | Dialect identification, POS | Multipurpose | Lexical Resources | Corpus | Arabic Dialects | http://alt.qcri.org/resources/da_resources/ |
8_10 | 8 | sentiment analysis using word embeddings | sentiment analysis code | Sentiment Analysis | Semantic Analysis | Text Classification | MSA | https://github.com/iamaziz/ar-embeddings |
8_11 | 8 | sentiment analysis dataset comparison | sentiment analysis dataset | Sentiment Analysis | Lexical Resources | Corpus | MSA | http://saifmohammad.com/WebPages/ArabicSA.html |
9_1 | 9 | ANT | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | MSA | https://antcorpus.github.io/ |
9_2 | 9 | CANER | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | CA | https://github.com/RamziSalah/Classical-Arabic-Named-Entity-Recognition-Corpus |
9_3 | 9 | PAAD | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | MSA | https://data.mendeley.com/datasets/spvbf5bgjs/2 |
9_4 | 9 | RCATS | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | MSA | https://fstf.fst-usmba.ac.ma/laboratoires/lsia/RCATSS/index.html |
9_5 | 9 | ANS | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | MSA | https://github.com/latynt/ans |
9_6 | 9 | DZDC12 | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | MSA | https://bit.ly/3uqX6bb |
9_7 | 9 | Kunuz | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | CA | http://jarir.tn/kunuzcorpus |
9_8 | 9 | OpenITI-proc corpus | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | CA | https://zenodo.org/record/2535593#.Yvo5EXZByHt |
9_9 | 9 | N/A | Multipurpose corpora | Multipurpose | Lexical Resources | Corpus | MSA | http://www.cs.cmu.edu/~fraisi/arabic/arparallel/ |
9_10 | 9 | N/A | Multipurpose corpora | Multipurpose | Lexical Resources | Lexicon | MSA, English | https://github.com/Hkiri-Emna/Named_Entities_Lexicon_Project |
9_11 | 9 | The Saudi Dialect Irony Dataset (Sa`7r ساخر) | Multipurpose corpora | Multipurpose | Lexical Resources | Lexicon | Saudi | https://github.com/iwan-rg/Saudi-Dialect-Irony-Dataset |
9_12 | 9 | MADAR | Dialectal Corpora | Multipurpose | Lexical Resources | Corpus | Multi | https://sites.google.com/nyu.edu/madar/ |
9_13 | 9 | Arabic Hate Speech Dataset | Hate speech detection | Content Moderation | Lexical Resources | Corpus | Arabic Dialects | https://github.com/sbalsefri/ArabicHateSpeechDataset |
9_14 | 9 | MADAR | Spelling correction corpus | Orthographic Consistency | Lexical Resources | Corpus | Multi | https://nyuad.nyu.edu/en/research/faculty-labs-and-projects/computational-approaches-to-modeling-language-lab/resources.html |
9_15 | 9 | DAICT | ARABIC IRONY CORPUS | Irony Detection | Lexical Resources | Corpus | MSA, Arabic Dialects | https://www.hbku.edu.qa/en/DAICT |
9_16 | 9 | Shami | Dialectal Corpora | Multipurpose | Lexical Resources | Corpus | Syrian | https://github.com/GU-CLASP/shami-corpus |
9_17 | 9 | N/A | Sentiment Analysis Corpora | Sentiment Analysis | Lexical Resources | Corpus | MSA | https://rb.gy/vea9g7 |
9_18 | 9 | RSAC | Sentiment Analysis Corpora | Sentiment Analysis | Lexical Resources | Lexicon | MSA | https://github.com/asooft/Sentiment-Analysis-Hotel-Reviews-Dataset |
9_19 | 9 | Moarlex | Sentiment Analysis Corpora | Sentiment Analysis | Lexical Resources | Lexicon | MSA, Egyptian | https://github.com/Mohabyoussef09/MoArLex |
9_20 | 9 | AraSenti Lexicon | Sentiment Analysis Corpora | Sentiment Analysis | Lexical Resources | Lexicon | MSA | https://github.com/nora-twairesh/AraSenti |
9_21 | 9 | Multi-domain Arabic Sentiment Corpus (MASC) | Sentiment Analysis Corpora | Sentiment Analysis | Lexical Resources | Lexicon | MSA | https://github.com/almoslmi/masc |
9_22 | 9 | The Arabic Dialect Identification for 17 countries (ADI17) Dataset | Speech Corpora | Language Identification | Lexical Resources | Corpus | Arabic Dialects | https://bit.ly/3kon1vo |
9_23 | 9 | Arabic Dialect Identification Corpora | Speech Corpora | Language Identification | Lexical Resources | Corpus | Multi | https://www.kaggle.com/datasets/corpora4research/arpod-corpus-based-on-arabic-podcasts |
9_24 | 9 | SmartATID | Image Corpora | Computer Vision | Lexical Resources | Corpus | MSA | https://sites.google.com/site/smartatid/ |
9_25 | 9 | ASAYAR | Image Corpora | Computer Vision | Lexical Resources | Corpus | Maghrebi | https://vcar.github.io/ASAYAR/ |
11_1 | 11 | HARD: Hotel Arabic-Reviews Dataset | hotel reviews dataset | Multipurpose | Lexical Resources | Corpus | MSA, Arabic Dialects | https://github.com/elnagara/HARD-Arabic-Dataset |
11_2 | 11 | BRAD: Books Reviews in Arabic Dataset | Book reviews dataset | Multipurpose | Lexical Resources | Corpus | MSA, Arabic Dialects | https://github.com/elnagara/BRAD-Arabic-Dataset |
11_3 | 11 | AOC dataset | Corpus | Multipurpose | Lexical Resources | Corpus | MSA | https://github.com/sjeblee/AOC/blob/master/stuff-from-omar/AOC_readme.txt |
11_4 | 11 | Nuanced Arabic Dialect Identification Shared Task Series (NADI) | Dialect identification | Language Identification | Language Identification | Language Identification | MSA, Arabic Dialects | https://github.com/UBC-NLP/nadi |
11_5 | 11 | BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts | Speech recognition, speech to speech translation | Multipurpose | Lexical Resources | Corpus | Levantine | https://catalog.ldc.upenn.edu/LDC2005S08 |
11_6 | 11 | Spoken Arabic Regional Archive (SARA) | Spoken arabic dialects | Multipurpose | Lexical Resources | Database | Arabic Dialects | https://data.mendeley.com/datasets/btfx5pw2rm/2 |
13_1 | 13 | The open parallel corpus | collection of translated texts from the web | Multipurpose | Lexical Resources | Corpus | Multi | http://opus.nlpl.eu/. |
13_2 | 13 | OpenNMT | An open source neural machine translation system. | Translation | NLP Toolkit | NLP Toolkit | Multi | https://opennmt.net/. |
13_3 | 13 | Fairseq | a sequence modeling toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://github.com/facebookresearch/fairseq |
13_4 | 13 | Tensor2Tensor | a library of deep learning models and datasets | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://github.com/tensorflow/tensor2tensor |
13_5 | 13 | Moses | statistical machine translation system | Translation | NLP Toolkit | NLP Toolkit | Multi | http://www2.statmt.org/moses/ |
13_6 | 13 | Phrasal | A statistical machine translation system | Translation | NLP Toolkit | NLP Toolkit | Multi | https://github.com/stanfordnlp/phrasal |
13_7 | 13 | Subword-nmt | Subword Neural Machine Translation | Morphological Analysis | NLP Toolkit | NLP Toolkit | Multi | https://github.com/rsennrich/subword-nmt |
13_8 | 13 | SentencePiece | an unsupervised text tokenizer and detokenizer | Morphological Analysis | NLP Toolkit | NLP Toolkit | Multi | https://github.com/google/sentencepiece |
13_9 | 13 | GIZA++ | Statistical Machine Translation | Translation | NLP Toolkit | NLP Toolkit | Multi | https://github.com/moses-smt/giza-pp |
13_10 | 13 | FastText Multilingual | fastText vectors of 78 languages | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://github.com/babylonhealth/fastText_multilingual |
13_11 | 13 | Gensim | Topic modelling | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://radimrehurek.com/gensim/ |
13_12 | 13 | FastText | Pre-trained word vectors | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md |
13_13 | 13 | NLG evaluation | Evaluation code for various unsupervised automated metrics for NLG | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://github.com/Maluuba/nlg-eval |
20_1 | 20 | Adawat | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Arabic Dialects | http://adawat.sourceforge.net/ |
20_2 | 20 | Salma AI | Financial Chatbot | Chatbot | Semantic Analysis | Dialogue system | MSA | https://salma.ai/home |
20_3 | 20 | Adam AI | Islam Chatbot | Chatbot | Semantic Analysis | Dialogue system | MSA | https://iadam.ai/ |
20_4 | 20 | Arabic Tools | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://www.arabitools.com/ |
20_5 | 20 | UIMA | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://uima.apache.org/d/uimaj-current/ |
20_6 | 20 | Safar | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | http://arabic.emi.ac.ma/safar/ |
25_1 | 25 | ACE 2004 Multilingual Training Corpus | Multilingual Copora | Multipurpose | Lexical Resources | Corpus | Multi | https://catalog.ldc.upenn.edu/LDC2005T09 |
25_2 | 25 | Arabic NLP Lexicons | Arabic lexical resources | Multipurpose | Lexical Resources | Corpus | MSA | https://www.cjk.org/data/arabic/nlp/ |
25_3 | 25 | The General Architecture for Text Engineering GATE | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | http://gate.ac.uk/. |
25_4 | 25 | LingPipe A toolkit for text engineering and processing | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | http://alias-i.com/lingpipe/. |
25_5 | 25 | Yasmet | maximum entropy models | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | http://www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html |
25_6 | 25 | CRF++ | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://taku910.github.io/crfpp/ |
25_7 | 25 | Yamcha | Multipurpose CHunk Annotator | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | http://chasen.org/~taku/software/yamcha/ |
25_8 | 25 | Weka | Machine Learning Software in Java | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://www.cs.waikato.ac.nz/ml/weka/ |
25_9 | 25 | NetOwl Extractor | Named Entity Extractor | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://www.netowl.com/entity-extraction |
25_10 | 25 | About Gazetteer | Lexical resources | Named Entity Recognition | Lexical Resources | Gazetteer | Multi | https://dbpedia.org/page/Gazetteer |
29_1 | 29 | MADAMIRA | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://camel.abudhabi.nyu.edu/madamira/ |
29_2 | 29 | FARASA | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://farasa.qcri.org/ |
29_3 | 29 | CAMeL | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://camel-tools.readthedocs.io/en/latest/ |
29_4 | 29 | ARBML | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://github.com/ARBML/ARBML |
29_5 | 29 | CoreNLP | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://stanfordnlp.github.io/CoreNLP/ |
29_6 | 29 | UDPipe | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://cran.r-project.org/web/packages/udpipe/index.html |
29_7 | 29 | Stanza | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://stanfordnlp.github.io/stanza/ |
29_8 | 29 | Trankit | Package, NLP Toolkit | Multipurpose | NLP Toolkit | NLP Toolkit | MSA | https://github.com/nlp-uoregon/trankit |
30_1 | 30 | AraGPT2 | language generation model | Multipurpose | Language Modeling | Language Model | MSA, Arabic Dialects | https://github.com/aub-mind/arabert/tree/master/aragpt2 |
30_2 | 30 | Arabert | Language Understanding and Generation | Multipurpose | Feature engineering | Word embeddings | MSA, Arabic Dialects | https://github.com/aub-mind/arabert |
36_1 | 36 | Lemur Toolkit | NLP software | Multipurpose | NLP Toolkit | NLP Toolkit | Multi | https://www.lemurproject.org/lemur.php |
36_2 | 36 | Arabic Wordnet | wordnet | Multipurpose | Lexical Resources | WordNEt | MSA | http://globalwordnet.org/resources/arabic-wordnet/awn-browser/ |
36_3 | 36 | Arabic Q&A Dataset | question answering dataset | Question Answering | Lexical Resources | Corpus | MSA | http://xminers.club/2017/07/22/arabic-qa-dataset/ |
36_4 | 36 | AR-ASAG-Dataset | The ARabic Dataset for Automatic Short Answer Grading Evaluation | Question Answering | Lexical Resources | Corpus | MSA | https://data.mendeley.com/datasets/dj95jh332j/1 |
36_5 | 36 | DAWQAS | A Dataset for Arabic Why Question Answering System | Question Answering | Lexical Resources | Corpus | MSA | https://github.com/masun/DAWQAS |
36_6 | 36 | Arabic AskFM Dataset | Islamic question answering dataset | Question Answering | Lexical Resources | Corpus | MSA | https://github.com/Omarito2412/ASKFM |