Authors: Wissam Antoun Fady Baly Rim Achour Amir Hussein Hazem Hajj

Abstract:

This paper presents state of the art methods for addressing three important challenges in automated fake news detection: fake news detection, domain identification, and bot identification in tweets. The proposed solutions achieved first place in a recent international competition on fake news. For fake news detection, we present two models. The winning model in the competition combines the similarity between the embedding of each article’s title and the embedding of the top five corresponding google search results. The new model relies on advances in Natural Language Understanding (NLU) end to end deep learning models to identify stylistic differences between legitimate and fake news articles. This second model was developed after the competition and outperforms the winning approach. For news domain detection, the winning model is a hybrid approach composed of named entity features concatenated with semantic embeddings derived from end to end models. For twitter bot detection, we propose to use the following features: duration between account creation and tweet date, presence of a tweet’s link, presence of user’s location, other tweet’s features, and the tweets’ metadata. Experiments include insights into the importance of the different features and the results indicate the superior performances of all proposed models.