Towards Robustness Against Natural Language Word Substitutions

Robustness against word substitutions has a well-defined and widely acceptable form, i.e., using semantically similar words as substitutions, and thus it is considered as a fundamental stepping-stone towards broader robustness in natural language processing. Previous defense methods capture word substitutions in vector space by using either l2-ball or hyper-rectangle, which results in perturbation sets that are not inclusive enough or unnecessarily large, and thus impedes mimicry of worst cases for robust training. In this paper, we introduce a novel Adversarial Sparse Convex Combination (ASCC) method. We model the word substitution attack space as a convex hull and leverages a regularization term to enforce perturbation towards an actual substitution, thus aligning our modeling better with the discrete textual space. Based on the ASCC method, we further propose ASCC-defense, which leverages ASCC to generate worst-case perturbations and incorporates adversarial training towards robustness. Experiments show that ASCC-defense outperforms the current state-of-the-arts in terms of robustness on two prevailing NLP tasks, i.e., sentiment analysis and natural language inference, concerning several attacks across multiple model architectures. Besides, we also envision a new class of defense towards robustness in NLP, where our robustly trained word vectors can be plugged into a normally trained model and enforce its robustness without applying any other defense techniques. 1

[1]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[2]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[3]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[4]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[5]  Iryna Gurevych,et al.  Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems , 2019, NAACL.

[6]  Peter Szolovits,et al.  Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment , 2019, ArXiv.

[7]  Zhiyuan Liu,et al.  Word-level Textual Adversarial Attacking as Combinatorial Optimization , 2019, ACL.

[8]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[9]  Hiroyuki Shindo,et al.  Interpretable Adversarial Perturbation in Input Embedding Space for Text , 2018, IJCAI.

[10]  L. Bloomfield A Set of Postulates for the Science of Language , 1926, International Journal of American Linguistics.

[11]  Xirong Li,et al.  Deep Text Classification Can be Fooled , 2017, IJCAI.

[12]  Bhuwan Dhingra,et al.  Combating Adversarial Misspellings with Robust Word Recognition , 2019, ACL.

[13]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[14]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[15]  Qiang Dong,et al.  Hownet And The Computation Of Meaning , 2006 .

[16]  Somesh Jha,et al.  Improving Adversarial Robustness by Data-Specific Discretization , 2018, ArXiv.

[17]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[18]  Yi Zhou,et al.  Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble , 2020, ArXiv.

[19]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[20]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[21]  Dina Katabi,et al.  ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation , 2019, ICML.

[22]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[23]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[24]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[25]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[26]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[27]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[29]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[30]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[31]  Liujuan Cao,et al.  API-Net: Robust Generative Classifier via a Single Discriminator , 2020, ECCV.

[32]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[33]  Lei Li,et al.  Generating Fluent Adversarial Examples for Natural Languages , 2019, ACL.

[34]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jie Li,et al.  Universal Adversarial Perturbation via Prior Driven Uncertainty Approximation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Yingzhen Li,et al.  Are Generative Classifiers More Robust to Adversarial Attacks? , 2018, ICML.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[39]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[40]  Bo Li,et al.  Adversarial Texts with Gradient Methods , 2018, ArXiv.

[41]  Aditi Raghunathan,et al.  Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.

[42]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[43]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[44]  Prateek Jain,et al.  Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..

[45]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[46]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[47]  Soheil Feizi,et al.  Interpretable Adversarial Training for Text , 2019, ArXiv.

[48]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[49]  C. Ji An Archetypal Analysis on , 2005 .

[50]  Po-Sen Huang,et al.  Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation , 2019, EMNLP/IJCNLP.

[51]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[52]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[53]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[54]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[55]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[56]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.