A Stylometric Inquiry into Hyperpartisan and Fake News

We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus of 1,627 articles from 9 political publishers, three each from the mainstream, the hyperpartisan left, and the hyperpartisan right, have been fact-checked by professional journalists at BuzzFeed: 97% of the 299 fake news articles identified are also hyperpartisan. We show how a style analysis can distinguish hyperpartisan news from the mainstream (F1 = 0.78), and satire from both (F1 = 0.81). But stylometry is no silver bullet as style-based fake news detection does not work (F1 = 0.46). We further reveal that left-wing and right-wing news share significantly more stylistic similarities than either does with the mainstream. This result is robust: it has been confirmed by three different modeling approaches, one of which employs Unmasking in a novel way. Applications of our results include partisanship detection and pre-screening for semi-automatic fake news detection.

[1]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[2]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[3]  Galit Avneri,et al.  Style-based Text Categorization: What Newspaper Am I Reading? , 1998 .

[4]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[5]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[6]  Nayer M. Wanas,et al.  Web-based statistical fact checking of textual documents , 2010, SMUC '10.

[7]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[8]  Qian Zhang,et al.  Collective attention in the age of (mis)information , 2014, Comput. Hum. Behav..

[9]  Adrian Popescu,et al.  Credibility in Information Retrieval , 2015, Found. Trends Inf. Retr..

[10]  Pankaj K. Agarwal,et al.  Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[11]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[12]  Fan Yang,et al.  Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features , 2017, EMNLP.

[13]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[14]  Moshe Koppel,et al.  Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[15]  Svitlana Volkova,et al.  Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter , 2017, ACL.

[16]  Wei Gao,et al.  An Empirical Study on Uncertainty Identification in Social Media Context , 2013, ACL.

[17]  Nam P. Nguyen,et al.  Containment of misinformation spread in online social networks , 2012, WebSci '12.

[18]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[19]  Divyakant Agrawal,et al.  Limiting the spread of misinformation in social networks , 2011, WWW.

[20]  Kyomin Jung,et al.  Prominent Features of Rumor Propagation in Online Social Media , 2013, 2013 IEEE 13th International Conference on Data Mining.

[21]  Benno Stein,et al.  Vandalism Detection in Wikidata , 2016, CIKM.

[22]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[23]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[24]  Yimin Chen,et al.  News in an online world: The need for an “automatic crap detector” , 2015, ASIST.

[25]  Filippo Menczer,et al.  Fact-checking Effect on Viral Hoaxes: A Model of Misinformation Spread in Social Networks , 2015, WWW.

[26]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[27]  Wei Gao,et al.  Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning , 2017, ACL.

[28]  Sameer Badaskar,et al.  Identifying Real or Fake Articles: Towards better Language Modeling , 2008, IJCNLP.

[29]  Georg Rehm,et al.  From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles , 2017, NLPmJ@EMNLP.

[30]  Asuman E. Ozdaglar,et al.  Spread of (Mis)Information in Social Networks , 2009, Games Econ. Behav..

[31]  Victoria L. Rubin,et al.  Towards News Verification: Deception Detection Methods for News Discourse , 2015 .

[32]  Chu-Ren Huang,et al.  Fake News Detection Through Multi-Perspective Speaker Profiles , 2017, IJCNLP.

[33]  Tim Weninger,et al.  Fact Checking in Heterogeneous Information Networks , 2016, WWW.