Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter

Pew research polls report 62 percent of U.S. adults get news on social media (Gottfried and Shearer, 2016). In a December poll, 64 percent of U.S. adults said that “made-up news” has caused a “great deal of confusion” about the facts of current events (Barthel et al., 2016). Fabricated stories in social media, ranging from deliberate propaganda to hoaxes and satire, contributes to this confusion in addition to having serious effects on global stability. In this work we build predictive models to classify 130 thousand news posts as suspicious or verified, and predict four sub-types of suspicious news – satire, hoaxes, clickbait and propaganda. We show that neural network models trained on tweet content and social network interactions outperform lexical models. Unlike previous work on deception detection, we find that adding syntax and grammar features to our models does not improve performance. Incorporating linguistic features improves classification results, however, social interaction features are most informative for finer-grained separation between four types of suspicious news posts.

[1]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[3]  J. Graham,et al.  When Morality Opposes Justice: Conservatives Have Moral Intuitions that Liberals may not Recognize , 2007 .

[4]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[5]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[6]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[7]  Yimin Chen,et al.  Deception detection for news: Three types of fakes , 2015, ASIST.

[8]  Jeffrey T. Hancock,et al.  On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication , 2007 .

[9]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[10]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[13]  Jeffrey A. Gottfried,et al.  News use across social media platforms 2016 , 2016 .

[14]  Verónica Pérez-Rosas,et al.  Experiments in Open Domain Deception Detection , 2015, EMNLP.

[15]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[16]  Brian A. Nosek,et al.  Liberals and conservatives rely on different sets of moral foundations. , 2009, Journal of personality and social psychology.

[17]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[19]  Filippo Menczer,et al.  Fact-checking Effect on Viral Hoaxes: A Model of Misinformation Spread in Social Networks , 2015, WWW.

[20]  J. Hooper On Assertive Predicates , 1975 .

[21]  A. Vrij,et al.  Cues to Deception and Ability to Detect Lies as a Function of Police Interview Styles , 2007, Law and human behavior.

[22]  Nick Feamster,et al.  #bias: Measuring the Tweeting Behavior of Propagandists , 2012, ICWSM.

[23]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[24]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.