Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate

When a message, such as a piece of news, spreads in social networks, how can we classify it into categories of interests, such as genuine or fake news? Classification of social media content is a fundamental task for social media mining, and most existing methods regard it as a text categorization problem and mainly focus on using content features, such as words and hashtags. However, for many emerging applications like fake news and rumor detection, it is very challenging, if not impossible, to identify useful features from content. For example, intentional spreaders of fake news may manipulate the content to make it look like real news. To address this problem, this paper concentrates on modeling the propagation of messages in a social network. Specifically, we propose a novel approach, TraceMiner, to (1) infer embeddings of social media users with social network structures; and (2) utilize an LSTM-RNN to represent and classify propagation pathways of a message. Since content information is sparse and noisy on social media, adopting TraceMiner allows to provide a high degree of classification accuracy even in the absence of content information. Experimental results on real-world datasets show the superiority over state-of-the-art approaches on the task of fake news detection and news categorization.

[1]  Kewei Cheng,et al.  Streaming Link Prediction on Dynamic Attributed Networks , 2018, WSDM.

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  G. Caldarelli,et al.  The spreading of misinformation online , 2016, Proceedings of the National Academy of Sciences.

[4]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[5]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  Damon Centola,et al.  The Spread of Behavior in an Online Social Network Experiment , 2010, Science.

[8]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[9]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[10]  Huan Liu,et al.  A new approach to bot detection: Striking the balance between precision and recall , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[11]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[12]  Huan Liu,et al.  Attributed Network Embedding for Learning in a Dynamic Environment , 2017, CIKM.

[13]  L. E. Clarke On Cayley's Formula for Counting Trees , 1958 .

[14]  Huan Liu,et al.  Unsupervised Personalized Feature Selection , 2018, AAAI.

[15]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[16]  Huan Liu,et al.  Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media , 2017, SDM.

[17]  Ching-Yung Lin,et al.  TargetVue: Visual Analysis of Anomalous User Behaviors in Online Communication Systems , 2016, IEEE Transactions on Visualization and Computer Graphics.

[18]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[19]  E. Laumann,et al.  Networks of Collective Action: A Perspective on Community Influence Systems , 1976 .

[20]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[21]  Huan Liu,et al.  Detecting Crowdturfing in Social Media , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[22]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[23]  Huan Liu,et al.  Adaptive Spammer Detection with Sparse Group Modeling , 2017, ICWSM.

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[26]  Huan Liu,et al.  Robust Unsupervised Feature Selection on Networked Data , 2016, SDM.

[27]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[28]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[29]  Jennifer Neville,et al.  Deep Collective Inference , 2017, AAAI.

[30]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[31]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[32]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[33]  Huan Liu,et al.  Relational Learning with Social Status Analysis , 2016, WSDM.

[34]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[35]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .

[36]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[37]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[38]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[39]  Huan Liu,et al.  Provenance Data in Social Media , 2013, Synthesis Lectures on Data Mining and Knowledge Discovery.

[40]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  Huan Liu,et al.  Detecting Camouflaged Content Polluters , 2017, ICWSM.

[42]  Huan Liu,et al.  Mining Misinformation in Social Media , 2016 .

[43]  Huan Liu,et al.  Toward Personalized Relational Learning , 2017, SDM.

[44]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[45]  Huan Liu,et al.  Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection , 2016, CIKM.