Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks

EARLY DETECTION OF FAKE NEWS ON SOCIAL MEDIA by Yang Liu The ever-increasing popularity and convenience of social media enable the rapid widespread of fake news, which can cause a series of negative impacts both on individuals and society. Early detection of fake news is essential to minimize its social harm. Existing machine learning approaches are incapable of detecting a fake news story soon after it starts to spread, because they require certain amounts of data to reach decent effectiveness which take time to accumulate. To solve this problem, this research first analyzes and finds that, on social media, the user characteristics of fake news spreaders distribute significantly differently from those of the general user population. Based on this finding and also the fact that news spreaders’ user profiles are usually readily available at the start of news propagation, this research proposes three machine learning models to achieve the goal of fake news early detection based on the user characteristics of its spreaders. The first model named Propagation Path Classification (PPC) detects fake news by combining recurrent neural networks with convolution neural networks to classify its propagation path which is represented as a sequence of user feature vectors. The second model named Social Media Content Classification (SMCC) improves the first model by adding 1) an embedding layer and an integration layer to model news spreaders, and 2) a fake news spreader likelihood score to model source users independently, which is particularly useful when the propagation path is extremely short, i.e., only very few retweets. The third model named Fake News Early Detection (FNED) further improves the first two models by combining users’ text responses with their user characteristics as status-sensitive crowd responses, which contain more information than text responses or user characteristics alone. Two novel deep learning mechanisms are also proposed as key components in the third model: 1) Position-aware attention mechanism to determine which status-sensitive crowd responses are more discriminative; and 2) Multi-region mean-pooling to aggregate intermediate features in multiple timeframes, which improves the performance when very few retweets are available and thus needing zero-padding. The third model also incorporates a PU-Learning (Learning from Positive and Unlabeled Examples) framework to handle unlabeled and imbalanced data. Comprehensive experiments were conducted to evaluate the proposed models on two datasets collected from Twitter and Sina Weibo, respectively. The experimental results demonstrate that the proposed models can detect fake news with over 90% accuracy within five minutes after it starts to spread and before it is retweeted 50 times, which is significantly faster than state-of-the-art baselines. Also, the third proposed model requires only 10% labeled fake news samples to achieve this effectiveness under PU-Learning settings. These advantages indicate a promising potential for the proposed models to be implemented in real-world social media platforms for fake news detection. EARLY DETECTION OF FAKE NEWS ON SOCIAL MEDIA

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[3]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[4]  Jian Dong,et al.  Automatic Detection of Rumor on Social Network , 2015, NLPCC.

[5]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[6]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[7]  Guido Caldarelli,et al.  Echo Chambers: Emotional Contagion and Group Polarization on Facebook , 2016, Scientific Reports.

[8]  Bu-Sung Lee,et al.  Unsupervised rumor detection based on users' behaviors using neural networks , 2017, Pattern Recognit. Lett..

[9]  Hal Berghel,et al.  Lies, Damn Lies, and Fake News , 2017, Computer.

[10]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[11]  Rishabh Kaushal,et al.  Towards automated real-time detection of misinformation on Twitter , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[12]  Naren Ramakrishnan,et al.  Epidemiological modeling of news and rumors on Twitter , 2013, SNAKDD '13.

[13]  Huan Liu,et al.  Mining Misinformation in Social Media , 2016 .

[14]  D. Weaver,et al.  The American Journalist in the Digital Age , 2017 .

[15]  Shigang Liu,et al.  A comparative study of the class imbalance problem in Twitter spam detection , 2018, Concurr. Comput. Pract. Exp..

[16]  Yimin Chen,et al.  Deception detection for news: Three types of fakes , 2015, ASIST.

[17]  Calton Pu,et al.  A social-spam detection framework , 2011, CEAS '11.

[18]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[19]  L. Ross,et al.  Naive realism in everyday life: Implications for social conflict and misunderstanding. , 1996 .

[20]  Cheng Liu,et al.  Protein-embedding Technique: A Potential Approach to Standardization of Immunohistochemistry for Formalin-fixed, Paraffin-embedded Tissue Sections , 2005, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[21]  H. Tajfel,et al.  An integrative theory of intergroup conflict. , 1979 .

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[24]  Wei Gao,et al.  Detecting Rumors from Microblogs with Recurrent Neural Networks , 2016, IJCAI.

[25]  Xiaomo Liu,et al.  Real-time Rumor Debunking on Twitter , 2015, CIKM.

[26]  Victoria L. Rubin Deception Detection and Rumor Debunking for Social Media , 2017 .

[27]  Carlo Tomasi,et al.  Singular Value Decomposition , 2021, Encyclopedia of Social Network Analysis and Mining.

[28]  Johannes Fürnkranz,et al.  A Study Using $n$-gram Features for Text Categorization , 1998 .

[29]  Yongdong Zhang,et al.  Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs , 2017, ACM Multimedia.

[30]  Wei Gao,et al.  Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning , 2017, ACL.

[31]  Franck Dernoncourt,et al.  Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks , 2016, NAACL.

[32]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[33]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[34]  A. Tversky,et al.  Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[35]  Yimin Chen,et al.  Misleading Online Content: Recognizing Clickbait as "False News" , 2015, WMDD@ICMI.

[36]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[37]  Wei Gao,et al.  Rumor Detection on Twitter with Tree-structured Recursive Neural Networks , 2018, ACL.

[38]  Wei Gao,et al.  Detect Rumors Using Time Series of Social Context Information on Microblogging Websites , 2015, CIKM.

[39]  Huan Liu,et al.  Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media , 2017, SDM.

[40]  Anupam Joshi,et al.  Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy , 2013, WWW.

[41]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[42]  Paul R. Brewer,et al.  The Impact of Real News about “Fake News”: Intertextual Processes and Political Satire , 2013 .

[43]  Xiang Li,et al.  Misinformation in Online Social Networks: Detect Them All with a Limited Budget , 2016, TOIS.

[44]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[45]  Huan Liu,et al.  Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection , 2016, CIKM.

[46]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[47]  Cheng Li,et al.  On Early-Stage Debunking Rumors on Twitter: Leveraging the Wisdom of Weak Learners , 2017, SocInfo.

[48]  Emilio Ferrara,et al.  Social Bots Distort the 2016 US Presidential Election Online Discussion , 2016, First Monday.

[49]  Huan Liu,et al.  Understanding User Profiles on Social Media for Fake News Detection , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[50]  Fenglong Ma,et al.  EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection , 2018, KDD.

[51]  G. Caldarelli,et al.  The spreading of misinformation online , 2016, Proceedings of the National Academy of Sciences.

[52]  Xiaoli Li,et al.  Learning from Positive and Unlabeled Examples with Different Data Distributions , 2005, ECML.

[53]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[54]  Manny Cohen Fake news and manipulated data, the new GDPR, and the future of information , 2017 .

[55]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[56]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[57]  M. Cha,et al.  Rumor Detection over Varying Time Windows , 2017, PloS one.

[58]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[59]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[60]  David O. Klein,et al.  Fake News: A Legal Perspective , 2017 .

[61]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[62]  Tetsuro Takahashi,et al.  Rumor detection on twitter , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[63]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[64]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[65]  Jin Yang,et al.  Automatic rumors identification on Sina Weibo , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[66]  C. Nass,et al.  Conceptualizing Sources in Online News , 2001 .

[67]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[68]  Jintao Li,et al.  Rumor Detection with Hierarchical Social Attention Network , 2018, CIKM.

[69]  Prakhar Biyani,et al.  "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality , 2016, AAAI.

[70]  Hongyan Liu,et al.  Detecting Event Rumors on Sina Weibo Automatically , 2013, APWeb.

[71]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[72]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[73]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[74]  B. Nyhan,et al.  When Corrections Fail: The Persistence of Political Misperceptions , 2010 .

[75]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[76]  Kyomin Jung,et al.  Prominent Features of Rumor Propagation in Online Social Media , 2013, 2013 IEEE 13th International Conference on Data Mining.

[77]  Christopher Paul,et al.  The Russian "Firehose of Falsehood" Propaganda Model: Why It Might Work and Options to Counter It , 2016 .

[78]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[79]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[80]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[81]  H. Tajfel,et al.  The Social Identity Theory of Intergroup Behavior. , 2004 .

[82]  Yongdong Zhang,et al.  Novel Visual and Statistical Image Features for Microblogs News Verification , 2017, IEEE Transactions on Multimedia.

[83]  Jun Zhang,et al.  Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection , 2017, ArXiv.

[84]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[85]  Sandra L. Borden,et al.  The Role of Journalist and the Performance of Journalism: Ethical Lessons From “Fake” News (Seriously) , 2007 .

[86]  Panagiotis Takis Metaxas,et al.  The Fake News Spreading Plague: Was it Preventable? , 2017, WebSci.

[87]  R. Zajonc Attitudinal effects of mere exposure. , 1968 .

[88]  S. Asch Effects of Group Pressure Upon the Modification and Distortion of Judgments , 1951 .

[89]  Arkaitz Zubiaga,et al.  Exploiting Context for Rumour Detection in Social Media , 2017, SocInfo.

[90]  Xing Zhou,et al.  Real-Time News Cer tification System on Sina Weibo , 2015, WWW.

[91]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[92]  G. Weikum Assessing the Credibility of Claims on the Web , 2017 .

[93]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[94]  Ciro Cattuto,et al.  Social spam detection , 2009, AIRWeb '09.

[95]  Sungyong Seo,et al.  CSI: A Hybrid Deep Model for Fake News Detection , 2017, CIKM.

[96]  Huan Liu,et al.  Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate , 2018, WSDM.

[97]  Kenny Q. Zhu,et al.  False rumors detection on Sina Weibo by propagation structures , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[98]  George Karypis,et al.  Evaluation of Techniques for Classifying Biological Sequences , 2002, PAKDD.