Detecting fake news for reducing misinformation risks using analytics approaches

Abstract Fake news is playing an increasingly dominant role in spreading misinformation by influencing people’s perceptions or knowledge to distort their awareness and decision-making. The growth of social media and online forums has spurred the spread of fake news causing it to easily blend with truthful information. This study provides a novel text analytics–driven approach to fake news detection for reducing the risks posed by fake news consumption. We first describe the framework for the proposed approach and the underlying analytical model including the implementation details and validation based on a corpus of news data. We collect legitimate and fake news, which is transformed from a document based corpus into a topic and event–based representation. Fake news detection is performed using a two-layered approach, which is comprised of detecting fake topics and fake events. The efficacy of the proposed approach is demonstrated through the implementation and validation of a novel FakE News Detection (FEND) system. The proposed approach achieves 92.49% classification accuracy and 94.16% recall based on the specified threshold value of 0.6.

[1]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[2]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[3]  Vera Lúcia Strube de Lima,et al.  Boosting Open Information Extraction with Noun-Based Relations , 2014, LREC.

[4]  Elmar Haussmann,et al.  Open Information Extraction via Contextual Sentence Decomposition , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[7]  Qiang Zhou,et al.  A semantic approach for text clustering using WordNet and lexical chains , 2015, Expert Syst. Appl..

[8]  Regina Barzilay,et al.  Machine learning to parse breast pathology reports in Chinese , 2018, Breast Cancer Research and Treatment.

[9]  Akash Iyengar,et al.  Integrated SPAM detection for multilingual emails , 2017, 2017 International Conference on Information Communication and Embedded Systems (ICICES).

[10]  Haixun Wang,et al.  Understand Short Texts by Harvesting and Analyzing Semantic Knowledge , 2017, IEEE Transactions on Knowledge and Data Engineering.

[11]  Kripabandhu Ghosh,et al.  Overview of the FIRE 2016 Microblog track: Information Extraction from Microblogs Posted during Disasters , 2016, FIRE.

[12]  Yimin Chen,et al.  Deception detection for news: Three types of fakes , 2015, ASIST.

[13]  Shie-Jue Lee,et al.  A Similarity Measure for Text Classification and Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[14]  B. Nyhan,et al.  Selective exposure to misinformation: Evidence from the consumption of fake news during the 2016 U.S. presidential campaign , 2018 .

[15]  David G. Rand,et al.  Who Falls for Fake News? The Roles of Bullshit Receptivity, Overclaiming, Familiarity, and Analytic Thinking , 2017, Journal of personality.

[16]  Weiguo Fan,et al.  An Integrated Text Analytic Framework for Product Defect Discovery , 2015 .

[17]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[18]  Vibhu O. Mittal,et al.  A fact/opinion classifier for news articles , 2007, SIGIR.

[19]  Marcus Taft,et al.  The effects of semantic transparency and base frequency on the recognition of English complex words. , 2015, Journal of experimental psychology. Learning, memory, and cognition.

[20]  Jeffrey T. Hancock,et al.  Hungry like the wolf: A word‐pattern analysis of the language of psychopaths , 2013 .

[21]  Paolo Rosso,et al.  Detecting positive and negative deceptive opinions using PU-learning , 2015, Inf. Process. Manag..

[22]  David O. Klein,et al.  Fake News: A Legal Perspective , 2017 .

[23]  Alexis Nasr,et al.  Deeper syntax for better semantic parsing , 2016, COLING.

[24]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[25]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[26]  Ishan Sahu,et al.  Detecting Factual and Non-Factual Content in News Articles , 2017, CODS.

[27]  Amit V. Deokar,et al.  Detecting Fraudulent Behavior on Crowdfunding Platforms: The Role of Linguistic and Content-Based Cues in Static and Dynamic Contexts , 2016, J. Manag. Inf. Syst..

[28]  Peng Wang,et al.  Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification , 2016, Neurocomputing.

[29]  Mark Fisher Who cares if it's true? , 2014 .

[30]  Guglielmo Cinque,et al.  The Semantic Classification of Adjectives. A View from Syntax. , 2014 .

[31]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[32]  B. Nyhan,et al.  When Corrections Fail: The Persistence of Political Misperceptions , 2010 .

[33]  Kjerstin Thorson,et al.  Partisan Selective Sharing: The Biased Diffusion of Fact-Checking Messages on Social Media , 2017 .

[34]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[35]  Victoria L. Rubin,et al.  Truth and deception at the rhetorical structure level , 2015, J. Assoc. Inf. Sci. Technol..

[36]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[37]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[38]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[39]  Chuan-Ju Wang,et al.  On the risk prediction and analysis of soft information in finance reports , 2017, Eur. J. Oper. Res..

[40]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[41]  Jie Zhang,et al.  Who will use augmented reality? An integrated approach based on text analytics and field survey , 2020, Eur. J. Oper. Res..

[42]  Niloy Ganguly,et al.  A Generic Opinion-Fact Classifier with Application in Understanding Opinionatedness in Various News Section , 2017, WWW.

[43]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  Chin-Tser Huang,et al.  A computational approach for examining the roots and spreading patterns of fake news: Evolution tree analysis , 2018, Comput. Hum. Behav..

[45]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[46]  Matthew Louis Mauriello,et al.  Fake News vs Satire: A Dataset and Analysis , 2018, WebSci.

[47]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[48]  Raymond Y. K. Lau,et al.  Parallel Aspect‐Oriented Sentiment Analysis for Sales Forecasting with Big Data , 2018 .

[49]  Raj Sharman,et al.  Misinformation in Online Health Communities , 2013 .

[50]  Ethan Porter,et al.  The Elusive Backfire Effect: Mass Attitudes’ Steadfast Factual Adherence , 2019 .

[51]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[52]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[53]  Yongdong Zhang,et al.  News Credibility Evaluation on Microblog with a Hierarchical Propagation Model , 2014, 2014 IEEE International Conference on Data Mining.

[54]  Raj Sharman,et al.  A Response Quality Model for Online Health Communities , 2014, ICIS.

[55]  Wan-Chi Siu,et al.  Vehicle detection under tough conditions using prioritized feature extraction with shadow recognition , 2017, 2017 22nd International Conference on Digital Signal Processing (DSP).