We are excited to share a preview of Gilbert Badaro’s (Ph.D.) paper titled “A Link Prediction Approach for Accurately Mapping a Large-Scale Arabic Lexical Resource to English WordNet”. The paper will be published at the prestigious ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP).

Abstract:
English WordNet (EWN), that is composed of sets of cognitive synonyms called synsets, which are interlinked by means of conceptual-semantic and lexical relations and where each synset expresses a distinct concept,  proved to be a leading example of a large-scale resource that has enabled advances in Natural Language Understanding tasks such as word sense disambiguation, question answering, sentiment analysis and emotion recognition. However, such resource does not exist for all languages in the same large scale as English WordNet since the development task of such resource involves a lot of time-consuming manual efforts. In this paper, we focus on the Arabic language. We propose to automatically extend an existing Arabic WordNet by formulating the problem as a link prediction between existing large-scale Arabic lexicon and English WordNet. We propose the use of a two-step boosting approach We propose the use of a two-step Boosting method, where the first step aims at linking English translations of the Arabic terms extracted from parallel corpora to EWN’s synsets. The second step uses surface similarity between the English definitions provided in the Arabic lexicon, SAMA, and EWN’s synsets. The proposed method resulted in an enhanced Arabic sentiment lexicon, ArSenL 2.0, compared to the state-of-the-art lexicon, ArSenL. A comprehensive study covering both intrinsic and extrinsic evaluations shows the superiority of the method compared to several baselines and state-of-the-art link prediction methods.