Adversarial Attack and Defense of Structured Prediction Models

Building an effective adversarial attacker and elaborating on countermeasures for adversarial attacks for natural language processing (NLP) have attracted a lot of research in recent years. However, most of the existing approaches focus on classification problems. In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. Besides the difficulty of perturbing discrete words and the sentence fluency problem faced by attackers in any NLP tasks, there is a specific challenge to attackers of structured prediction models: the structured output of structured prediction models is sensitive to small perturbations in the input. To address these problems, we propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model with feedbacks from multiple reference models of the same structured prediction task. Based on the proposed attack, we further reinforce the victim model with adversarial training, making its prediction more robust and accurate. We evaluate the proposed framework in dependency parsing and part-of-speech tagging. Automatic and human evaluations show that our proposed framework succeeds in both attacking state-of-the-art structured prediction models and boosting them with adversarial training.

[1]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[2]  Erik Nijkamp,et al.  Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation , 2020, ACL.

[3]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[4]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[5]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[6]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[7]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[8]  Yejin Choi,et al.  Learning to Write with Cooperative Discriminators , 2018, ACL.

[9]  Sameep Mehta,et al.  Towards Crafting Text Adversarial Samples , 2017, ArXiv.

[10]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[11]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[13]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[14]  Facebook,et al.  Houdini : Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples , 2017 .

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[17]  Bo Li,et al.  Adversarial Texts with Gradient Methods , 2018, ArXiv.

[18]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[19]  Jingzhou Liu,et al.  Stack-Pointer Networks for Dependency Parsing , 2018, ACL.

[20]  Dilin Wang,et al.  Improving Neural Language Modeling via Adversarial Training , 2019, ICML.

[21]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Stephan Gunnemann,et al.  Adversarial Attacks on Graph Neural Networks via Meta Learning , 2019, ICLR.

[24]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[25]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[26]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[27]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[28]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Lei Li,et al.  Generating Fluent Adversarial Examples for Natural Languages , 2019, ACL.

[31]  Jungo Kasai,et al.  Robust Multilingual Part-of-Speech Tagging via Adversarial Training , 2017, NAACL.

[32]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[33]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[36]  Graham Neubig,et al.  On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models , 2019, NAACL.

[37]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[38]  Hiroyuki Shindo,et al.  Interpretable Adversarial Perturbation in Input Embedding Space for Text , 2018, IJCAI.