Unsupervised Content-Based Identification of Fake News Articles with Tensor Decomposition Ensembles

Social media provide a platform for quick and seamless access to information. However, the propagation of false information, especially during the last year, raises major concerns, especially given the fact that social media are the primary source of information for a large percentage of the population. False information may manipulate people’s beliefs and have real-life consequences. Œerefore, one a major challenge is to automatically identify false information by categorizing it into di‚erent types and notify users about the credibility of di‚erent articles shared online. Existing primarily focuses on feature generation and selection from various sources, including corpus-related features. However, so far, prior work has not paid considerable aŠention to the following question: how accurately can we distinguish di‚erent categories of false news, solely based on the content? In this paper we work on answering this question. In particular, we propose a tensor modeling of the problem, where we capture latent relations between articles and terms, as well as spatial/contextual relations between terms, towards unlocking the full potential of the content. Furthermore, we propose an ensemble method which judiciously combines and consolidates results form di‚erent tensor decompositions into clean, coherent, and highaccuracy groups of articles that belong to di‚erent categories of false news. We extensively evaluate our proposed method on real data, for which we have labels, and demonstrate that the proposed algorithm was able to identify all di‚erent false news categories within the corpus, with average homogeneity per group of up to 80%.

[1]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[2]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[3]  L. K. Hansen,et al.  Independent Components in Text , 2000 .

[4]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[5]  Haesun Park,et al.  Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Damon Centola,et al.  The Spread of Behavior in an Online Social Network Experiment , 2010, Science.

[7]  Nikos D. Sidiropoulos,et al.  Co-clustering as multilinear decomposition with sparse latent factors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[9]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[10]  Ponnurangam Kumaraguru,et al.  Credibility ranking of tweets during high impact events , 2012, PSOSM '12.

[11]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[12]  Hyun Ah Song,et al.  Hierarchical Representation Using NMF , 2013, ICONIP.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Haesun Park,et al.  Fast rank-2 nonnegative matrix factorization for hierarchical document clustering , 2013, KDD.

[15]  Nicolas Gillis,et al.  Robustness Analysis of Hottopixx, a Linear Programming Model for Factoring Nonnegative Matrices , 2012, SIAM J. Matrix Anal. Appl..

[16]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[17]  Wei Gao,et al.  Detect Rumors Using Time Series of Social Context Information on Microblogging Websites , 2015, CIKM.

[18]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[19]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[20]  Filippo Menczer,et al.  Hoaxy: A Platform for Tracking Online Misinformation , 2016, WWW.

[21]  Huan Liu,et al.  Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection , 2016, CIKM.

[22]  Andreas Vlachos,et al.  Emergent: a novel data-set for stance classification , 2016, NAACL.

[23]  Huan Liu,et al.  Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media , 2017, SDM.

[24]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[25]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[26]  Mingyi Hong,et al.  Anchor-Free Correlated Topic Modeling , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.