Synthetic Disinformation Attacks on Automated Fact Verification Systems

Automated fact-checking is a needed technology to curtail the spread of online misinformation. One current framework for such solutions proposes to verify claims by retrieving supporting or refuting evidence from related textual sources. However, the realistic use cases for fact-checkers will require verifying claims against evidence sources that could be affected by the same misinformation. Furthermore, the development of modern NLP tools that can produce coherent, fabricated content would allow malicious actors to systematically generate adversarial disinformation for fact-checkers. In this work, we explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings: ADVERSARIAL ADDITION, where we fabricate documents and add them to the evidence repository available to the fact-checking system, and ADVERSARIAL MODIFICATION, where existing evidence source documents in the repository are automatically altered. Our study across multiple models on three benchmarks demonstrates that these systems suffer significant performance drops against these attacks. Finally, we discuss the growing threat of modern NLG systems as generators of disinformation in the context of the challenges they pose to automated fact-checkers.

[1]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[2]  Tal August,et al.  All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text , 2021, ACL.

[3]  Smaranda Muresan,et al.  COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic , 2021, ACL.

[4]  Junichi Yamagishi,et al.  A Multi-Level Attention Model for Evidence-Based Fact Checking , 2021, FINDINGS.

[5]  Shi Feng,et al.  Concealed Data Poisoning Attacks on NLP Models , 2021, NAACL.

[6]  Ben Buchanan,et al.  Truth, Lies, and Automation: How Language Models Could Change Disinformation , 2021 .

[7]  Yejin Choi,et al.  “I’m Not Mad”: Commonsense Implications of Negation and Contradiction , 2021, NAACL.

[8]  Jordan L. Boyd-Graber,et al.  Fool Me Twice: Entailment from Wikipedia Gamification , 2021, NAACL.

[9]  Madian Khabsa,et al.  Towards Few-shot Fact-Checking via Perplexity , 2021, NAACL.

[10]  Regina Barzilay,et al.  Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence , 2021, NAACL.

[11]  Nanyun Peng,et al.  A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification , 2020, SDU@AAAI.

[12]  N. Biller-Andorno,et al.  The anti-vaccination infodemic on social media: A behavioral analysis , 2020, medRxiv.

[13]  Asli Celikyilmaz,et al.  The Amazing World of Neural Language Generation , 2020, EMNLP.

[14]  Jimmy J. Lin,et al.  Scientific Claim Verification with VerT5erini , 2020, LOUHI.

[15]  Shyam Subramanian,et al.  Hierarchical Evidence Set Modeling for Automated Fact Extraction and Verification , 2020, EMNLP.

[16]  S. Kreps,et al.  All the News That’s Fit to Fabricate: AI-Generated Text as a Tool of Media Misinformation , 2020, Journal of Experimental Political Science.

[17]  Isabelle Augenstein,et al.  Generating Label Cohesive and Well-Formed Adversarial Claims , 2020, EMNLP.

[18]  Nicola De Cao,et al.  KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.

[19]  Madian Khabsa,et al.  Language Models as Fact Checkers? , 2020, FEVER.

[20]  Tom B. Brown,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[21]  Hannaneh Hajishirzi,et al.  Fact or Fiction: Verifying Scientific Claims , 2020, EMNLP.

[22]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[23]  Zhenghao Liu,et al.  Coreferential Reasoning Learning for Language Representation , 2020, EMNLP.

[24]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[25]  Hinrich Schütze,et al.  Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly , 2019, ACL.

[26]  Chris Callison-Burch,et al.  Automatic Detection of Generated Text is Easiest when Humans are Fooled , 2019, ACL.

[27]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[28]  Chenyan Xiong,et al.  Fine-grained Fact Verification with Kernel Graph Attention Network , 2019, ACL.

[29]  M. Zhou,et al.  Reasoning Over Semantic-Level Graph for Fact Checking , 2019, ACL.

[30]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[31]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[32]  Maosong Sun,et al.  GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification , 2019, ACL.

[33]  Wei Lu,et al.  Attention Guided Graph Convolutional Networks for Relation Extraction , 2019, ACL.

[34]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[35]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[36]  Ben Johnson,et al.  The tactics & tropes of the Internet Research Agency , 2018 .

[37]  Mohit Bansal,et al.  Combining Fact Extraction and Verification with Neural Semantic Matching Networks , 2018, AAAI.

[38]  Iryna Gurevych,et al.  UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification , 2018, FEVER@EMNLP.

[39]  Andreas Vlachos,et al.  Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[40]  Kate Starbird,et al.  Ecosystem or Echo-System? Exploring Content Sharing across Alternative Media Domains , 2018, ICWSM.

[41]  D. Boyd,et al.  Data Voids: Where Missing Data Can Easily Be Exploited , 2018 .

[42]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[43]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[44]  Hyrum S. Anderson,et al.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[45]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[46]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[47]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[48]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[51]  Sebastian Riedel,et al.  UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF) , 2018, FEVER@EMNLP.

[52]  Jordan L. Boyd-Graber,et al.  Language Models , 2009, Encyclopedia of Database Systems.