Two-Path Deep Semisupervised Learning for Timely Fake News Detection

News in social media, such as Twitter, has been generated in high volume and speed. However, very few of them are labeled (as fake or true news) by professionals in near real time. In order to achieve timely detection of fake news in social media, a novel framework of two-path deep semisupervised learning (SSL) is proposed where one path is for supervised learning and the other is for unsupervised learning. The supervised learning path learns on the limited amount of labeled data, while the unsupervised learning path is able to learn on a huge amount of unlabeled data. Furthermore, these two paths implemented with convolutional neural networks (CNNs) are jointly optimized to complete SSL. In addition, we build a shared CNN to extract the low-level features on both labeled data and unlabeled data to feed them into these two paths. To verify this framework, we implement a Word CNN-based SSL model and test it on two data sets: LIAR and PHEME. Experimental results demonstrate that the model built on the proposed framework can recognize fake news effectively with very few labeled data.

[1]  David G. Rand,et al.  Fighting misinformation on social media using crowdsourced judgments of news source quality , 2018, Proceedings of the National Academy of Sciences.

[2]  Maryam Yammahi,et al.  Construction of FuzzyFind Dictionary using Golay Coding Transformation for Searching Applications , 2015, International Journal of Advanced Computer Science and Applications.

[3]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[4]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[8]  Lijun Qian,et al.  A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records , 2018, BMC Bioinformatics.

[9]  Jason Baldridge,et al.  Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph , 2011, ULNLP@EMNLP.

[10]  Arkaitz Zubiaga,et al.  All-in-one: Multi-task Learning for Rumour Verification , 2018, COLING.

[11]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[12]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[13]  Jianxin Li,et al.  Mining Semantic Variation in Time Series for Rumor Detection Via Recurrent Neural Networks , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[14]  Chu-Ren Huang,et al.  Fake News Detection Through Multi-Perspective Speaker Profiles , 2017, IJCNLP.

[15]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[16]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[17]  Dirk Hovy,et al.  The Enemy in Your Own Camp: How Well Can We Detect Statistically-Generated Fake Reviews – An Adversarial Study , 2016, ACL.

[18]  Sungyong Seo,et al.  CSI: A Hybrid Deep Model for Fake News Detection , 2017, CIKM.

[19]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[20]  Antonio Ortega,et al.  Ups and Downs in Buzzes: Life Cycle Modeling for Temporal Pattern Discovery , 2014, 2014 IEEE International Conference on Data Mining.

[21]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[22]  Wei Gao,et al.  Detect Rumors Using Time Series of Social Context Information on Microblogging Websites , 2015, CIKM.

[23]  Lijun Qian,et al.  Deep learning for named entity recognition on Chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN , 2019, PloS one.

[24]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[25]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[26]  Antonio Ortega,et al.  Lifecycle Modeling for Buzz Temporal Pattern Discovery , 2016, ACM Trans. Knowl. Discov. Data.

[27]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[28]  Arkaitz Zubiaga,et al.  Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads , 2015, PloS one.

[29]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[30]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[31]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[32]  Victoria L. Rubin Semi-supervised Content-based Fake News Detection using Tensor Embeddings and Label Propagation , 2018 .

[33]  Lei Shi,et al.  Cross Language Text Classification by Model Translation and Semi-Supervised Learning , 2010, EMNLP.

[34]  D. Lazer,et al.  Fake news on Twitter during the 2016 U.S. presidential election , 2019, Science.

[35]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[36]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[37]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[38]  Qingcai Chen,et al.  Fuzzy deep belief networks for semi-supervised sentiment classification , 2014, Neurocomputing.

[39]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[40]  Li Zhao,et al.  Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint , 2016, AAAI.

[41]  Wei Gao,et al.  Detecting Rumors from Microblogs with Recurrent Neural Networks , 2016, IJCAI.

[42]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[43]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[44]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[45]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[46]  Yukari Shirota,et al.  Rumor analysis framework in social media , 2011, TENCON 2011 - 2011 IEEE Region 10 Conference.

[47]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[48]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[49]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[50]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[51]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[52]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[53]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[54]  Jiaul H. Paik A novel TF-IDF weighting scheme for effective ranking , 2013, SIGIR.

[55]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[56]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[57]  Kyomin Jung,et al.  Prominent Features of Rumor Propagation in Online Social Media , 2013, 2013 IEEE 13th International Conference on Data Mining.

[58]  Jing Qian,et al.  A Survey on Natural Language Processing for Fake News Detection , 2018, LREC.

[59]  John Barrett,et al.  Book Reviews , 1821, Heredity.