Factual Error Correction of Claims

This paper introduces the task of factual error correction: performing edits to a claim so that the generated rewrite is supported by evidence. This serves two purposes: firstly this provides a mechanism to correct written texts that contain misinformation, and secondly, this acts as an inherent explanation for claims already partially supported by evidence. We demonstrate that factual error correction is possible without the need for any additional training data using distant-supervision and retrieved evidence. We release a dataset of 65,000 instances, based on a recent fact verification dataset, to compare our distantly-supervised method to a fully supervised ceiling system. Our manual evaluation indicates which automated evaluation metrics best correlate with human judgements of factuality and whether errors were actually corrected.

[1]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[4]  Preslav Nakov,et al.  Fully Automated Fact Checking Using External Sources , 2017, RANLP.

[5]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[6]  Trevor Darrell,et al.  Object Hallucination in Image Captioning , 2018, EMNLP.

[7]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[8]  Cong Yu,et al.  Computational Journalism: A Call to Arms to Database Researchers , 2011, CIDR.

[9]  Na-Rae Han,et al.  Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System , 2010, LREC.

[10]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[11]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[12]  Christian Hansen,et al.  MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims , 2019, EMNLP.

[13]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[14]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[15]  Mona T. Diab,et al.  Detecting Hallucinated Content in Conditional Neural Sequence Generation , 2020, FINDINGS.

[16]  Jakob Grue Simonsen,et al.  Generating Fact Checking Explanations , 2020, ACL.

[17]  Christos Christodoulopoulos,et al.  Evaluating adversarial attacks against multiple fact verification systems , 2019, EMNLP.

[18]  Ted Briscoe,et al.  Grammatical error correction using neural machine translation , 2016, NAACL.

[19]  Andreas Vlachos,et al.  Generating Token-Level Explanations for Natural Language Inference , 2019, NAACL.

[20]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  Francesca Toni,et al.  Explainable Automated Fact-Checking for Public Health Claims , 2020, EMNLP.

[23]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[24]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[25]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Madian Khabsa,et al.  Language Models as Fact Checkers? , 2020, FEVER.

[28]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[29]  Andreas Vlachos,et al.  The Fact Extraction and VERification (FEVER) Shared Task , 2018, FEVER@EMNLP.

[30]  Nicola De Cao,et al.  KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, ArXiv.

[31]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[32]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[33]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[34]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[35]  Hannaneh Hajishirzi,et al.  Fact or Fiction: Verifying Scientific Claims , 2020, EMNLP.

[36]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[37]  Chin-Yew Lin,et al.  A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.

[38]  Jackie Chi Kit Cheung,et al.  Factual Error Correction for Abstractive Summarization Models , 2020, EMNLP.

[39]  Asli Celikyilmaz,et al.  Evaluation of Text Generation: A Survey , 2020, ArXiv.

[40]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[41]  Elliott Ash,et al.  e-FEVER: Explanations and Summaries forAutomated Fact Checking , 2020, TTO.

[42]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[43]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[44]  Regina Barzilay,et al.  Automatic Fact-guided Sentence Modification , 2020, AAAI.

[45]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .