Fake news detection in multiple platforms and languages

Abstract The debate around fake news has grown recently because of the potential harm they can have on different fields, being politics one of the most affected. Due to the amount of news being published every day, several studies in computer science have proposed models using machine learning to detect fake news. However, most of these studies focus on news from one language (mostly English) or rely on characteristics of social media-specific platforms (like Twitter or Sina Weibo). Our work proposes to detect fake news using only text features that can be generated regardless of the source platform and are the most independent of the language as possible. We carried out experiments from five datasets, comprising both texts and social media posts, in three language groups: Germanic, Latin, and Slavic, and got competitive results when compared to benchmarks. We compared the results obtained through a custom set of features and with other popular techniques when dealing with natural language processing, such as bag-of-words and Word2Vec.

[1]  Huan Liu,et al.  Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate , 2018, WSDM.

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[4]  Paolo Rosso,et al.  Leveraging Emotional Signals for Credibility Detection , 2019, SIGIR.

[5]  Issa Traore,et al.  Detecting opinion spams and fake news using text classification , 2018, Secur. Priv..

[6]  Yongdong Zhang,et al.  News Verification by Exploiting Conflicting Social Viewpoints in Microblogs , 2016, AAAI.

[7]  Feng Yu,et al.  Attention-based convolutional approach for misinformation identification from massive and noisy microblog posts , 2019, Comput. Secur..

[8]  Hsuan-Tien Lin,et al.  Learning From Data , 2012 .

[9]  Murhaf Fares,et al.  Word vectors, reuse, and replicability: Towards a community repository of large-text resources , 2017, NODALIDA.

[10]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[11]  Preslav Nakov,et al.  In Search of Credible News , 2016, AIMSA.

[12]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[13]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[14]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Fabrício Olivetti de França,et al.  Combining Multiple Views from a Distance Based Feature Extraction for Text Classification , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[18]  Ashit Talukder,et al.  Active learning based news veracity detection with feature weighting and deep-shallow fusion , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Jiawei Han,et al.  Evaluating Event Credibility on Twitter , 2012, SDM.

[21]  Sungyong Seo,et al.  CSI: A Hybrid Deep Model for Fake News Detection , 2017, CIKM.

[22]  Athena Vakali,et al.  Behind the cues: A benchmarking study for fake news detection , 2019, Expert Syst. Appl..

[23]  Tiago A. Almeida,et al.  Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results , 2018, PROPOR.

[24]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[25]  D. White The “Gate Keeper”: A Case Study in the Selection of News , 1950 .

[26]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[27]  Daniela Godoy,et al.  Short-text feature construction and selection in social media data: a survey , 2018, Artificial Intelligence Review.

[28]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Thiago Ferreira Covões,et al.  Fake News Detection Using One-Class Classification , 2019, 2019 8th Brazilian Conference on Intelligent Systems (BRACIS).

[31]  Heiko Paulheim,et al.  Weakly Supervised Learning for Fake News Detection on Twitter , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).