Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples

We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search algorithms are proposed in past work, the attack search space is often modified alongside the search algorithm. Without ablation studies benchmarking the search algorithm change with the search space held constant, one cannot tell if an increase in attack success rate is a result of an improved search algorithm or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms’ run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[3]  Qiang Dong,et al.  Hownet And The Computation Of Meaning , 2006 .

[4]  Qiang Dong,et al.  HowNet and Its Computation of Meaning , 2010, COLING.

[5]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[6]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[7]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[10]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[11]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[12]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[13]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[14]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[15]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[16]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[17]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[18]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[19]  Quan Z. Sheng,et al.  Generating Textual Adversarial Examples for Deep Learning Models: A Survey , 2019, ArXiv.

[20]  Aditi Raghunathan,et al.  Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.

[21]  Alexandros G. Dimakis,et al.  Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification , 2018, MLSys.

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Hyun Oh Song,et al.  Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization , 2019, ICML.

[24]  Peter Szolovits,et al.  Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment , 2019, ArXiv.

[25]  Lei Li,et al.  Generating Fluent Adversarial Examples for Natural Languages , 2019, ACL.

[26]  Sameer Singh,et al.  AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models , 2019, EMNLP.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[29]  John X. Morris,et al.  Reevaluating Adversarial Examples in Natural Language , 2020, FINDINGS.

[30]  Meng Zhang,et al.  Textual Adversarial Attack as Combinatorial Optimization , 2019, 1910.12196.

[31]  John X. Morris,et al.  TextAttack: A Framework for Adversarial Attacks in Natural Language Processing , 2020, ArXiv.