A Survey on Natural Language Processing for Fake News Detection

Fake news detection is a critical yet challenging problem in Natural Language Processing (NLP). The rapid rise of social networking platforms has not only yielded a vast increase in information accessibility but has also accelerated the spread of fake news. Thus, the effect of fake news has been growing, sometimes extending to the offline world and threatening public safety. Given the massive amount of Web content, automatic fake news detection is a practical NLP problem useful to all online content providers, in order to reduce the human time and effort to detect and prevent the spread of fake news. In this paper, we describe the challenges involved in fake news detection and also describe related tasks. We systematically review and compare the task formulations, datasets and NLP solutions that have been developed for this task, and also discuss the potentials and limitations of them. Based on our insights, we outline promising research directions, including more fine-grained, detailed, fair, and practical detection models. We also highlight the difference between fake news detection and other related tasks, and the importance of NLP solutions for fake news detection.

[1]  Andreas Vlachos,et al.  Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[2]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[3]  Lucas Graves Understanding the Promise and Limits of Automated Fact-Checking , 2018 .

[4]  Jiliang Tang,et al.  Multi-Source Multi-Class Fake News Detection , 2018, COLING.

[5]  Jake Ryland Williams,et al.  BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos , 2018, ICWSM.

[6]  Tom M. Mitchell,et al.  Language-Aware Truth Assessment of Fact Candidates , 2014, ACL.

[7]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[8]  Eric Gilbert,et al.  CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations , 2015, ICWSM.

[9]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[10]  Iryna Gurevych,et al.  UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification , 2018, FEVER@EMNLP.

[11]  Victoria L. Rubin,et al.  Identification of Truth and Deception in Text: Application of Vector Space Model to Rhetorical Structure Theory , 2012 .

[12]  Andreas Vlachos,et al.  Fact Checking: Task definition and dataset construction , 2014, LTCSS@ACL.

[13]  Stefano Mizzaro,et al.  How Many Truth Levels? Six? One Hundred? Even More? Validating Truthfulness of Statements via Crowdsourcing , 2018, CIKM Workshops.

[14]  Arkaitz Zubiaga,et al.  Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads , 2015, PloS one.

[15]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[16]  Andreas Vlachos,et al.  Emergent: a novel data-set for stance classification , 2016, NAACL.

[17]  Huan Liu,et al.  Beyond News Contents: The Role of Social Context for Fake News Detection , 2017, WSDM.

[18]  Chengkai Li,et al.  Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster , 2017, KDD.

[19]  B. Morton Fake news. , 2018, Marine pollution bulletin.

[20]  Dan Roth,et al.  TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification , 2018, EMNLP.

[21]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[22]  Massimo Di Pierro,et al.  Automatic Online Fake News Detection Combining Content and Social Signals , 2018, 2018 22nd Conference of Open Innovations Association (FRUCT).

[23]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[24]  Cody Buntain,et al.  Automatically Identifying Fake News in Popular Twitter Threads , 2017, 2017 IEEE International Conference on Smart Cloud (SmartCloud).

[25]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[26]  Yimin Chen,et al.  Deception detection for news: Three types of fakes , 2015, ASIST.

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Ashit Talukder,et al.  Active learning based news veracity detection with feature weighting and deep-shallow fusion , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[29]  Victoria L. Rubin,et al.  Towards News Verification: Deception Detection Methods for News Discourse , 2015 .

[30]  Duc Minh Nguyen,et al.  Deep Learning for Geolocating Social Media Users and Detecting Fake News , 2018 .

[31]  Arkaitz Zubiaga,et al.  All-in-one: Multi-task Learning for Rumour Verification , 2018, COLING.

[32]  Chu-Ren Huang,et al.  Fake News Detection Through Multi-Perspective Speaker Profiles , 2017, IJCNLP.

[33]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[34]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[35]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[36]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[37]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches – Erratum , 2010, Natural Language Engineering.

[38]  Huan Liu,et al.  Exploiting Tri-Relationship for Fake News Detection , 2017, ArXiv.

[39]  Eugenio Tacchini,et al.  Some Like it Hoax: Automated Fake News Detection in Social Networks , 2017, ArXiv.

[40]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[41]  Sungyong Seo,et al.  CSI: A Hybrid Deep Model for Fake News Detection , 2017, CIKM.