Evaluating adversarial attacks against multiple fact verification systems

Automated fact verification has been progressing owing to advancements in modeling and availability of large datasets. Due to the nature of the task, it is critical to understand the vulnerabilities of these systems against adversarial instances designed to make them predict incorrectly. We introduce two novel scoring metrics, attack potency and system resilience which take into account the correctness of the adversarial instances, an aspect often ignored in adversarial evaluations. We consider six fact verification systems from the recent Fact Extraction and VERification (FEVER) challenge: the four best-scoring ones and two baselines. We evaluate adversarial instances generated by a recently proposed state-of-the-art method, a paraphrasing method, and rule-based attacks devised for fact verification. We find that our rule-based attacks have higher potency, and that while the rankings among the top systems changed, they exhibited higher resilience than the baselines.

[1]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[2]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[3]  Ieva Staliūnaite,et al.  Breaking Sentiment Analysis of Movie Reviews , 2017 .

[4]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[5]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[6]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[7]  Sebastian Riedel,et al.  UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF) , 2018, FEVER@EMNLP.

[8]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[9]  Micha Elsner,et al.  Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems , 2017 .

[10]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[11]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[12]  Carolyn Penstein Rosé,et al.  Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[13]  Preslav Nakov,et al.  Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[14]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[15]  Christopher Malon,et al.  Team Papelo: Transformer Networks at FEVER , 2019, ArXiv.

[16]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[17]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[18]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[19]  Iryna Gurevych,et al.  UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification , 2018, FEVER@EMNLP.

[20]  Hector J. Levesque,et al.  On our best behaviour , 2014, Artif. Intell..

[21]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[22]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[23]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[24]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.

[25]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[26]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[27]  Haonan Chen,et al.  Combining Fact Extraction and Verification with Neural Semantic Matching Networks , 2018, AAAI.