论文信息 - Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

We present a probabilistic framework for studying adversarial attacks on discrete data. Based on this framework, we derive a perturbation-based method, Greedy Attack, and a scalable learning-based method, Gumbel Attack, that illustrate various tradeoffs in the design of attacks. We demonstrate the effectiveness of these methods using both quantitative metrics and human evaluation on various state-of-the-art models for text classification, including a word-based CNN, a character-based CNN and an LSTM. As as example of our results, we show that the accuracy of character-based convolutional networks drops to the level of random selection by modifying only five characters through Greedy Attack.

[1] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[3] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.

[4] Pedro M. Domingos,et al. Adversarial classification , 2004, KDD.

[5] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[6] Gianluca Stringhini,et al. Detecting spammers on social networks , 2010, ACSAC '10.

[7] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[8] Marcus A. Maloof,et al. Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[9] Ananthram Swami,et al. Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[10] David West,et al. Neural network credit scoring models , 2000, Comput. Oper. Res..

[11] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14] Yanjun Qi,et al. Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[15] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[16] Xinlei Chen,et al. Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Xirong Li,et al. Deep Text Classification Can be Fooled , 2017, IJCAI.

[19] Douglas L. Reilly,et al. Credit card fraud detection with a neural-network , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[20] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[21] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22] Jinfeng Yi,et al. Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples , 2018, AAAI.

[23] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[25] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[26] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[27] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28] David Vandyke,et al. Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[29] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[30] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[31] Richard A. Berk,et al. Statistical Procedures for Forecasting Criminal Behavior , 2013 .

[32] Sameep Mehta,et al. Towards Crafting Text Adversarial Samples , 2017, ArXiv.