论文信息 - A Stylometric Inquiry into Hyperpartisan and Fake News

A Stylometric Inquiry into Hyperpartisan and Fake News

We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus of 1,627 articles from 9 political publishers, three each from the mainstream, the hyperpartisan left, and the hyperpartisan right, have been fact-checked by professional journalists at BuzzFeed: 97% of the 299 fake news articles identified are also hyperpartisan. We show how a style analysis can distinguish hyperpartisan news from the mainstream (F1 = 0.78), and satire from both (F1 = 0.81). But stylometry is no silver bullet as style-based fake news detection does not work (F1 = 0.46). We further reveal that left-wing and right-wing news share significantly more stylistic similarities than either does with the mainstream. This result is robust: it has been confirmed by three different modeling approaches, one of which employs Unmasking in a novel way. Applications of our results include partisanship detection and pre-screening for semi-automatic fake news detection.

[1] J. Pennebaker,et al. Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[2] Victoria L. Rubin,et al. Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[3] Galit Avneri,et al. Style-based Text Categorization: What Newspaper Am I Reading? , 1998 .

[4] Maria Leonor Pacheco,et al. of the Association for Computational Linguistics: , 2001 .

[5] Marshall S. Smith,et al. The general inquirer: A computer approach to content analysis. , 1967 .

[6] Nayer M. Wanas,et al. Web-based statistical fact checking of textual documents , 2010, SMUC '10.

[7] Johan Bollen,et al. Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[8] Qian Zhang,et al. Collective attention in the age of (mis)information , 2014, Comput. Hum. Behav..

[9] Adrian Popescu,et al. Credibility in Information Retrieval , 2015, Found. Trends Inf. Retr..

[10] Pankaj K. Agarwal,et al. Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[11] Arkaitz Zubiaga,et al. SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[12] Fan Yang,et al. Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features , 2017, EMNLP.

[13] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.

[14] Moshe Koppel,et al. Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[15] Svitlana Volkova,et al. Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter , 2017, ACL.

[16] Wei Gao,et al. An Empirical Study on Uncertainty Identification in Social Media Context , 2013, ACL.

[17] Nam P. Nguyen,et al. Containment of misinformation spread in online social networks , 2012, WebSci '12.

[18] Eunsol Choi,et al. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[19] Divyakant Agrawal,et al. Limiting the spread of misinformation in social networks , 2011, WWW.

[20] Kyomin Jung,et al. Prominent Features of Rumor Propagation in Online Social Media , 2013, 2013 IEEE 13th International Conference on Data Mining.

[21] Benno Stein,et al. Vandalism Detection in Wikidata , 2016, CIKM.

[22] Rachel Greenstadt,et al. Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[23] Sibel Adali,et al. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[24] Yimin Chen,et al. News in an online world: The need for an “automatic crap detector” , 2015, ASIST.

[25] Filippo Menczer,et al. Fact-checking Effect on Viral Hoaxes: A Model of Misinformation Spread in Social Networks , 2015, WWW.

[26] Oren Etzioni,et al. TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[27] Wei Gao,et al. Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning , 2017, ACL.

[28] Sameer Badaskar,et al. Identifying Real or Fake Articles: Towards better Language Modeling , 2008, IJCNLP.

[29] Georg Rehm,et al. From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles , 2017, NLPmJ@EMNLP.

[30] Asuman E. Ozdaglar,et al. Spread of (Mis)Information in Social Networks , 2009, Games Econ. Behav..

[31] Victoria L. Rubin,et al. Towards News Verification: Deception Detection Methods for News Discourse , 2015 .

[32] Chu-Ren Huang,et al. Fake News Detection Through Multi-Perspective Speaker Profiles , 2017, IJCNLP.

[33] Tim Weninger,et al. Fact Checking in Heterogeneous Information Networks , 2016, WWW.