FakeDetector: Effective Fake News Detection with Deep Diffusive Neural Network

In recent years, due to the booming development of online social networks, fake news for various commercial and political purposes has been appearing in large numbers and widespread in the online world. With deceptive words, online social network users can get infected by these online fake news easily, which has brought about tremendous effects on the offline society already. An important goal in improving the trustworthiness of information in online social networks is to identify the fake news timely. This paper aims at investigating the principles, methodologies and algorithms for detecting fake news articles, creators and subjects from online social networks and evaluating the corresponding performance. This paper addresses the challenges introduced by the unknown characteristics of fake news and diverse connections among news articles, creators and subjects. This paper introduces a novel gated graph neural network, namely FAKEDETECTOR. Based on a set of explicit and latent features extracted from the textual information, FAKEDETECTOR builds a deep diffusive network model to learn the representations of news articles, creators and subjects simultaneously. Extensive experiments have been done on a real-world fake news dataset to compare FAKEDETECTOR with several state-of-the-art models, and the experimental results are provided in the full-version of this paper at [13].

[1]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[2]  Philip S. Yu,et al.  Discovering Audience Groups and Group-Specific Influencers , 2015, ECML/PKDD.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[5]  Neil Daswani,et al.  The Anatomy of Clickbot.A , 2007, HotBots.

[6]  Jian Pei,et al.  Sketching Landscapes of Page Farms , 2007, SDM.

[7]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[8]  Xinchang Zhang,et al.  Link based small sample learning for web spam detection , 2009, WWW '09.

[9]  Zhoujun Li,et al.  TI-CNN: Convolutional Neural Networks for Fake News Detection , 2018, ArXiv.

[10]  Philip S. Yu,et al.  HeteroSales: Utilizing Heterogeneous Social Networks to Identify the Next Enterprise Customer , 2016, WWW.

[11]  Huan Liu,et al.  Exploiting Tri-Relationship for Fake News Detection , 2017, ArXiv.

[12]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[13]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[14]  Philip S. Yu,et al.  Influence Maximization Across Partially Aligned Heterogenous Social Networks , 2015, PAKDD.

[15]  Eugenio Tacchini,et al.  Some Like it Hoax: Automated Fake News Detection in Social Networks , 2017, ArXiv.

[16]  Chunheng Wang,et al.  Improving web spam detection with re-extracted features , 2008, WWW.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  Yiqun Liu,et al.  User behavior oriented web spam detection , 2008, WWW.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Oliver A. McBryan,et al.  GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[21]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.

[23]  Toru Ishida,et al.  Analysis and improvement of HITS algorithm for detecting Web communities , 2004, Systems and Computers in Japan.

[24]  Philip S. Yu,et al.  Review spam detection via time series pattern discovery , 2012, WWW.

[25]  J. Morris Chang,et al.  An Effective Method for Combating Malicious Scripts Clickbots , 2009, ESORICS.

[26]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[27]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[28]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[29]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[30]  Malik Magdon-Ismail,et al.  Optimal Link Bombs are Uncoordinated , 2005, AIRWeb.

[31]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[32]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[33]  Vincent S. Foster The Great Moon Hoax , 2016 .

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[35]  Torsten Suel,et al.  Improving web spam classifiers using link structure , 2007, AIRWeb '07.

[36]  Tara N. Sainath,et al.  Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[37]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[38]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[39]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[40]  David Maxwell Chickering,et al.  Improving Cloaking Detection using Search Query Popularity and Monetizability , 2006, AIRWeb.

[41]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[42]  Isha Ghosh,et al.  Automated Fake News Detection Using Linguistic Analy- sis and Machine Learning , 2017 .

[43]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[44]  Tie-Yan Liu,et al.  BrowseRank: letting web users vote for page importance , 2008, SIGIR '08.

[45]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[46]  Philip S. Yu,et al.  Modeling and utilizing dynamic influence strength for personalized promotion , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[47]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[48]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[49]  Filip Radlinski,et al.  Addressing Malicious Noise in Clickthrough Data , 2007 .

[50]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[51]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[52]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, CCS '10.

[53]  Nicole Immorlica,et al.  Click Fraud Resistant Methods for Learning Click-Through Rates , 2005, WINE.

[54]  Jun-Lin Lin Detection of cloaked web spam by using tag-based methods , 2009, Expert Syst. Appl..

[55]  Calton Pu,et al.  Predicting web spam with HTTP session information , 2008, CIKM '08.

[56]  Xiaojie Yuan,et al.  Are click-through data adequate for learning web search rankings? , 2008, CIKM '08.

[57]  Hector Garcia-Molina,et al.  Link Spam Alliances , 2005, VLDB.

[58]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[59]  Marc Najork,et al.  On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[60]  Marc Najork,et al.  Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.

[61]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[62]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[63]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[64]  Brian D. Davison,et al.  Cloaking and Redirection: A Preliminary Study , 2005, AIRWeb.

[65]  Brian D. Davison,et al.  Detecting semantic cloaking on the web , 2006, WWW '06.

[66]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[67]  Qiang Wu,et al.  Improving web spam classification using rank-time features , 2007, AIRWeb '07.

[68]  Marc Najork,et al.  Detecting phrase-level duplication on the world wide web , 2005, SIGIR '05.

[69]  Tobias Scheffer,et al.  Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam , 2005, ECML.

[70]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[71]  Thomas Lavergne,et al.  Tracking Web Spam with Hidden Style Similarity , 2006, AIRWeb.

[72]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.