POPQORN: Quantifying Robustness of Recurrent Neural Networks

The vulnerability to adversarial attacks has been a critical issue for deep neural networks. Addressing this issue requires a reliable way to evaluate the robustness of a network. Recently, several methods have been developed to compute $\textit{robustness quantification}$ for neural networks, namely, certified lower bounds of the minimum adversarial perturbation. Such methods, however, were devised for feed-forward networks, e.g. multi-layer perceptron or convolutional networks. It remains an open problem to quantify robustness for recurrent networks, especially LSTM and GRU. For such networks, there exist additional challenges in computing the robustness quantification, such as handling the inputs at multiple steps and the interaction between gates and states. In this work, we propose $\textit{POPQORN}$ ($\textbf{P}$ropagated-$\textbf{o}$ut$\textbf{p}$ut $\textbf{Q}$uantified R$\textbf{o}$bustness for $\textbf{RN}$Ns), a general algorithm to quantify robustness of RNNs, including vanilla RNNs, LSTMs, and GRUs. We demonstrate its effectiveness on different network architectures and show that the robustness quantification on individual steps can lead to new insights.

[1]  Pushmeet Kohli,et al.  A Dual Approach to Scalable Verification of Deep Networks , 2018, UAI.

[2]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[3]  Sijia Liu,et al.  CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks , 2018, AAAI.

[4]  Jinfeng Yi,et al.  Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples , 2018, AAAI.

[5]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[7]  Matthew Mirman,et al.  Fast and Effective Robustness Certification , 2018, NeurIPS.

[8]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[9]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[10]  Christian Poellabauer,et al.  Crafting Adversarial Examples For Speech Paralinguistics Applications , 2017, ArXiv.

[11]  Cho-Jui Hsieh,et al.  Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[12]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[13]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[14]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[15]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[16]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[17]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[18]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[19]  Moustapha Cissé,et al.  Houdini: Fooling Deep Structured Prediction Models , 2017, ArXiv.

[20]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[21]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[22]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[23]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[24]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[25]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[26]  Dejing Dou,et al.  On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.