Automatic Fact-Checking Using Context and Discourse Information

We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information. We address two related tasks: (i) detecting check-worthy claims and (ii) fact-checking claims. We develop supervised systems based on neural networks, kernel-based support vector machines, and combinations thereof, which make use of rich input representations in terms of discourse cues and contextual features. For the check-worthiness estimation task, we focus on political debates, and we model the target claim in the context of the full intervention of a participant and the previous and following turns in the debate, taking into account contextual meta information. For the fact-checking task, we focus on answer verification in a community forum, and we model the veracity of the answer with respect to the entire question–answer thread in which it occurs as well as with respect to other related posts from the entire forum. We develop annotated datasets for both tasks and we run extensive experimental evaluation, confirming that both types of information—but especially contextual features—play an important role.

[1]  Miriam J. Metzger,et al.  The science of fake news , 2018, Science.

[2]  Chu-Ren Huang,et al.  Incorporate Credibility into Context for the Best Social Media Answers , 2010, PACLIC.

[3]  Naeemul Hassan,et al.  Comparing Automated Factual Claim Detection Against Judgments of Journalism Organizations , 2016 .

[4]  Chengkai Li,et al.  Detecting Check-worthy Factual Claims in Presidential Debates , 2015, CIKM.

[5]  Aishik Chakraborty,et al.  Detection of Sockpuppets in Social Media , 2017, CSCW Companion.

[6]  Gilles Louppe,et al.  Independent consultant , 2013 .

[7]  Eugene Agichtein,et al.  Discovering authorities in question answer communities by using link analysis , 2007, CIKM '07.

[8]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[9]  Preslav Nakov,et al.  Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 2: Factuality , 2018, CLEF.

[10]  Preslav Nakov,et al.  SemEval-2017 Task 3: Community Question Answering , 2017, *SEMEVAL.

[11]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[12]  Saurabh Bagchi,et al.  TATHYA: A Multi-Classifier System for Detecting Check-Worthy Statements in Political Debates , 2017, CIKM.

[13]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[14]  J. Hooper On Assertive Predicates , 1975 .

[15]  Preslav Nakov,et al.  Exposing Paid Opinion Manipulation Trolls , 2015, RANLP.

[16]  Preslav Nakov,et al.  SemEval-2016 Task 3: Community Question Answering , 2019, *SEMEVAL.

[17]  Shafiq R. Joty,et al.  CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[18]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[19]  James A. Malcolm,et al.  Detecting Short Passages of Similar Text in Large Document Collections , 2001, EMNLP.

[20]  John Mark Agosta,et al.  Highlighting disputed claims on the web , 2010, WWW '10.

[21]  Preslav Nakov,et al.  Do Not Trust the Trolls: Predicting Credibility in Community Question Answering Forums , 2017, RANLP.

[22]  Preslav Nakov,et al.  We Built a Fake News / Click Bait Filter: What Happened Next Will Blow Your Mind! , 2017, RANLP.

[23]  Laure Berti-Équille,et al.  VERA: A Platform for Veracity Estimation over Web Data , 2016, WWW.

[24]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[25]  Kevin Robert Canini,et al.  Finding Credible Information Sources in Social Networks Based on Content and Social Structure , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Preslav Nakov,et al.  Finding Opinion Manipulation Trolls in News Community Forums , 2015, CoNLL.

[28]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[29]  Ann M. Brill,et al.  Online Journalists Embrace New Marketing Function , 2001 .

[30]  Preslav Nakov,et al.  Fact Checking in Community Forums , 2018, AAAI.

[31]  Preslav Nakov,et al.  ClaimRank: Detecting Check-Worthy Claims in Arabic and English , 2018, NAACL.

[32]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[33]  Lucian Vlad Lita,et al.  Qualitative Dimensions in Question Answering: Extending the Definitional QA Task , 2005, AAAI.

[34]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[35]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[36]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[37]  Srinivasan Venkatesh,et al.  Battling the Internet water army: Detection of hidden paid posters , 2011, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[38]  Preslav Nakov,et al.  Fully Automated Fact Checking Using External Sources , 2017, RANLP.

[39]  Gerhard Weikum,et al.  Credibility Assessment of Textual Claims on the Web , 2016, CIKM.

[40]  Preslav Nakov,et al.  In Search of Credible News , 2016, AIMSA.

[41]  Preslav Nakov,et al.  Automatic Stance Detection Using End-to-End Memory Networks , 2018, NAACL.

[42]  Preslav Nakov,et al.  Seminar Users in the Arabic Twitter Sphere , 2017, SocInfo.

[43]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[44]  Preslav Nakov,et al.  Hunting for Troll Comments in News Community Forums , 2016, ACL.

[45]  Wei Gao,et al.  Detecting Rumors from Microblogs with Recurrent Neural Networks , 2016, IJCAI.

[46]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[47]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .

[48]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[49]  Kalina Bontcheva,et al.  Overview of the Special Issue on Trust and Veracity of Information in Social Media , 2016, TOIS.

[50]  Gerhard Weikum,et al.  Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[51]  Preslav Nakov,et al.  The dark side of news community forums: opinion manipulation trolls , 2018, Internet Res..

[52]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches , 2009, Natural Language Engineering.

[53]  Gerhard Weikum,et al.  Leveraging Joint Interactions for Credibility Analysis in News Communities , 2015, CIKM.

[54]  Preslav Nakov,et al.  Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness , 2018, CLEF.

[55]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[56]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[57]  Roland Kuhn,et al.  Stabilizing Minimum Error Rate Training , 2009, WMT@EACL.

[58]  K. Hyland,et al.  Metadiscourse: Exploring Interaction in Writing , 2005 .

[59]  Ngoc Thang Vu,et al.  Towards a text analysis system for political debates , 2016, LaTeCH@ACL.

[60]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[61]  Preslav Nakov,et al.  SemanticZ at SemEval-2016 Task 3: Ranking Relevant Answers in Community Question Answering Using Semantic Similarity Based on Fine-tuned Word Embeddings , 2016, *SEMEVAL.

[62]  Wei Gao,et al.  Detect Rumors Using Time Series of Social Context Information on Microblogging Websites , 2015, CIKM.

[63]  Barbara Rosario,et al.  What is disputed on the web? , 2010, WICOW '10.

[64]  Matteo Negri,et al.  An Open-Source Package for Recognizing Textual Entailment , 2010, ACL.

[65]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[66]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[67]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[68]  Hyoil Han,et al.  Answer Credibility: A Language Modeling Approach to Answer Validation , 2009, NAACL.

[69]  Noriko Kando,et al.  Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task , 2010, NTCIR.

[70]  Arkaitz Zubiaga,et al.  Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads , 2015, PloS one.

[71]  Preslav Nakov,et al.  Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[72]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[73]  Preslav Nakov,et al.  SemEval-2015 Task 3: Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[74]  Preslav Nakov,et al.  A Context-Aware Approach for Detecting Worth-Checking Claims in Political Debates , 2017, RANLP.