Fake News Spreader Identification in Twitter using Ensemble Modeling. Notebook for PAN at CLEF 2020

In this paper, we describe our participation in the author profiling task at PAN 2020. The task consists of detecting fake news spreaders on Twitter, based on a hundred selected tweets from their profile. In our approach, we utilized TFIDF and word embeddings as text representation as well as taking advantage of statistical and implicit features. A combinational classification model is proposed to fuse the impact of all groups of features. The approach obtained highly competitive classification accuracies on both English (0.695) and Spanish (0.785) subsets of the task.

[1]  Benno Stein,et al.  Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter , 2018, CLEF.

[2]  Paolo Rosso,et al.  An Emotional Analysis of False Information in Social Media and News Articles , 2019, ACM Trans. Internet Techn..

[3]  Maarten Sap,et al.  Developing Age and Gender Predictive Lexica over Social Media , 2014, EMNLP.

[4]  Benno Stein,et al.  TIRA Integrated Research Architecture , 2019, Information Retrieval Evaluation in a Changing World.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[7]  Diana Inkpen,et al.  Gender Identification in Twitter using N-grams and LSA: Notebook for PAN at CLEF 2018 , 2018, CLEF.

[8]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[9]  Saif Mohammad,et al.  Word Affect Intensities , 2017, LREC.

[10]  M. D. Rijke,et al.  Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF , 2019, Information Retrieval Evaluation in a Changing World.

[11]  Paolo Rosso,et al.  Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter , 2020, CLEF.

[12]  Fredrik Johansson,et al.  Supervised Classification of Twitter Accounts Based on Textual Content of Tweets , 2019, CLEF.

[13]  Paolo Rosso,et al.  On the impact of emotions on author profiling , 2016, Inf. Process. Manag..

[14]  Ismael Díaz Rangel,et al.  Creación y evaluación de un diccionario marcado con emociones y ponderado para el español , 2014 .

[15]  Reza Zafarani,et al.  The Role of User Profiles for Fake News Detection , 2019, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).