BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning systems. While wordand sentence-level attack scenarios mostly deal with finding semantic paraphrases of the input that fool NLP models, character-level attacks typically insert typos into the input stream. It is commonly thought that these are easier to defend via spelling correction modules. In this work, we show that both a standard spellchecker and the approach of Pruthi et al. (2019), which trains to defend against insertions, deletions and swaps, perform poorly on the character-level benchmark recently proposed in Eger and Benz (2020) which includes more challenging attacks such as visual and phonetic perturbations and missing word segmentations. In contrast, we show that an untrained iterative approach which combines context-independent character-level information with context-dependent information from BERT’s masked language modeling can perform on par with human crowd-workers from Amazon Mechanical Turk (AMT) supervised via 3-shot learning.

[1]  Kai-Wei Chang,et al.  Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification , 2019, EMNLP.

[2]  Elizabeth Salesky,et al.  Robust Open-Vocabulary Translation from Visual Text Representations , 2021, EMNLP.

[3]  Fei Liu,et al.  MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.

[4]  Iryna Gurevych,et al.  Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems , 2019, NAACL.

[5]  Jinfeng Li,et al.  TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation , 2020, USENIX Security Symposium.

[6]  Aditi Raghunathan,et al.  Robust Encodings: A Framework for Combating Adversarial Typos , 2020, ACL.

[7]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[8]  Eric P. Xing,et al.  Word Shape Matters: Robust Machine Translation with Visual Embedding , 2020, ArXiv.

[9]  Bhuwan Dhingra,et al.  Combating Adversarial Misspellings with Robust Word Recognition , 2019, ACL.

[10]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[11]  Alexander Mehler,et al.  A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction , 2016, Prague Bull. Math. Linguistics.

[12]  Yifei Hu,et al.  Misspelling Correction with Pre-trained Contextual Language Model , 2020, 2020 IEEE 19th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[13]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[14]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[15]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[16]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[17]  Zhiyuan Liu,et al.  Word-level Textual Adversarial Attacking as Combinatorial Optimization , 2019, ACL.

[18]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[19]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Steffen Eger,et al.  From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks , 2020, AACL.

[22]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Wei Liu,et al.  Token-Modification Adversarial Attacks for Natural Language Processing: A Survey , 2021, ArXiv.

[25]  Jessica B. Hamrick,et al.  psiTurk: An open-source framework for conducting replicable behavioral experiments online , 2016, Behavior research methods.

[26]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[27]  Steffen Eger Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P , 2015, EMNLP.

[28]  Dejing Dou,et al.  On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.

[29]  Zhiyuan Liu,et al.  OpenAttack: An Open-source Textual Adversarial Attack Toolkit , 2020, ACL.

[30]  Peter Szolovits,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.