Automatic Fake News Detection: Are Models Learning to Reason?

Most fact checking models for automatic fake news detection are based on reasoning: given a claim with associated evidence, the models aim to estimate the claim veracity based on the supporting or refuting content within the evidence. When these models perform well, it is generally assumed to be due to the models having learned to reason over the evidence with regards to the claim. In this paper, we investigate this assumption of reasoning, by exploring the relationship and importance of both claim and evidence. Surprisingly, we find on political fact checking datasets that most often the highest effectiveness is obtained by utilizing only the evidence, as the impact of including the claim is either negligible or harmful to the effectiveness. This highlights an important problem in what constitutes evidence in existing approaches for automatic fake news detection.

[1]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[2]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[3]  Chengkai Li,et al.  Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster , 2017, KDD.

[4]  Diego Esteves,et al.  DeFactoNLP: Fact Verification using Entity Recognition, TFIDF Vector Comparison and Decomposable Attention , 2018, FEVER@EMNLP.

[5]  Isabelle Augenstein,et al.  Time-Aware Evidence Ranking for Fact-Checking , 2020, Journal of Web Semantics.

[6]  Preslav Nakov,et al.  Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Christian Hansen,et al.  Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking , 2019, WWW.

[9]  Dhruv Batra,et al.  Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.

[10]  Gerhard Weikum,et al.  DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning , 2018, EMNLP.

[11]  Michael Strube,et al.  Lexical Features in Coreference Resolution: To be Used With Caution , 2017, ACL.

[12]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[13]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[14]  Christian Hansen,et al.  MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims , 2019, EMNLP.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[17]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[20]  Andreas Vlachos,et al.  Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[21]  Preslav Nakov,et al.  That is a Known Lie: Detecting Previously Fact-Checked Claims , 2020, ACL.

[22]  Verónica Pérez-Rosas,et al.  Automatic Detection of Fake News , 2017, COLING.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Preslav Nakov,et al.  CheckThat! at CLEF 2019: Automatic Identification and Verification of Claims , 2019, ECIR.