论文信息 - Disentangling Adversarial Robustness and Generalization

Disentangling Adversarial Robustness and Generalization

Obtaining deep networks that are robust against adversarial examples and generalize well is an open problem. A recent hypothesis even states that both robust and accurate models are impossible, i.e., adversarial robustness and generalization are conflicting goals. In an effort to clarify the relationship between robustness and generalization, we assume an underlying, low-dimensional data manifold and show that: 1. regular adversarial examples leave the manifold; 2. adversarial examples constrained to the manifold, i.e., on-manifold adversarial examples, exist; 3. on-manifold adversarial examples are generalization errors, and on-manifold adversarial training boosts generalization; 4. regular robustness and generalization are not necessarily contradicting goals. These assumptions imply that both robust and accurate models are possible. However, different models (architectures, training strategies etc.) can exhibit different robustness and generalization characteristics. To confirm our claims, we present extensive experiments on synthetic data (with known manifold) as well as on EMNIST, Fashion-MNIST and CelebA.

[1] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4] Terrance E. Boult,et al. Adversarial Robustness: Softmax versus Openmax , 2017, BMVC.

[5] Somesh Jha,et al. Analyzing the Robustness of Nearest Neighbors to Adversarial Examples , 2017, ICML.

[6] Holger Ulmer,et al. Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2017, ArXiv.

[7] Matthias Bethge,et al. Comment on "Biologically inspired protection of deep networks from adversarial attacks" , 2017, ArXiv.

[8] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[9] Lujo Bauer,et al. On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10] Seong Joon Oh,et al. Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[11] Seyed-Mohsen Moosavi-Dezfooli,et al. Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[12] Matthias Hein,et al. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[13] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[15] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[16] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[17] Qiang Xu,et al. Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks , 2018, AAAI.

[18] Ole Winther,et al. Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[19] Christopher Ré,et al. Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[20] James Bailey,et al. The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[21] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[22] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[24] Moustapha Cissé,et al. Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples , 2017, NIPS.

[25] Valentina Zantedeschi,et al. Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[26] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[27] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[28] Mingyan Liu,et al. Spatially Transformed Adversarial Examples , 2018, ICLR.

[29] Alexandros G. Dimakis,et al. The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[30] Aleksander Madry,et al. Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Dawn Xiaodong Song,et al. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[33] Martin Wattenberg,et al. Adversarial Spheres , 2018, ICLR.

[34] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[35] Giovanni S. Alberti,et al. ADef: an Iterative Algorithm to Construct Adversarial Deformations , 2018, ICLR.

[36] Jungwoo Lee,et al. Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN , 2017, ArXiv.

[37] Shakir Mohamed,et al. Variational Approaches for Auto-Encoding Generative Adversarial Networks , 2017, ArXiv.

[38] Binghui Wang,et al. Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[39] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[40] Eduardo Valle,et al. Adversarial Images for Variational Autoencoders , 2016, ArXiv.

[41] Alan L. Yuille,et al. Improving Transferability of Adversarial Examples With Input Diversity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Nicholas Carlini,et al. On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses , 2018, ArXiv.

[43] Matthias Bethge,et al. Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[44] Ioannis Mitliagkas,et al. Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations , 2018, ArXiv.

[45] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[46] Alan L. Yuille,et al. Mitigating adversarial effects through randomization , 2017, ICLR.

[47] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[48] Samuel Marchal,et al. PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[49] John C. Duchi,et al. Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[50] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[51] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[52] Raja Giryes,et al. Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[53] Lewis D. Griffin,et al. A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples , 2016, ArXiv.

[54] Xiaolin Hu,et al. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55] Pascal Frossard,et al. Fundamental limits on adversarial robustness , 2015, ICML 2015.

[56] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[57] Ryan R. Curtin,et al. Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[58] David A. Wagner,et al. Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[59] Ming-Yu Liu,et al. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[60] James A. Storer,et al. Protecting JPEG Images Against Adversarial Attacks , 2018, 2018 Data Compression Conference.

[61] Thomas Brox,et al. Adversarial Examples for Semantic Image Segmentation , 2017, ICLR.

[62] Aleksander Madry,et al. A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations , 2017, ArXiv.

[63] Gregory Cohen,et al. EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[64] Yongdong Zhang,et al. APE-GAN: Adversarial Perturbation Elimination with GAN , 2017, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[65] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[66] James Bailey,et al. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[67] Pascal Frossard,et al. Adaptive data augmentation for image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[68] Rama Chellappa,et al. UPSET and ANGRI : Breaking High Performance Image Classifiers , 2017, ArXiv.

[69] Shin Ishii,et al. Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[70] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[71] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[73] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[74] Amos J. Storkey,et al. Augmenting Image Classifiers Using Data Augmentation Generative Adversarial Networks , 2018, ICANN.

[75] Jinfeng Yi,et al. Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[76] Cho-Jui Hsieh,et al. Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[77] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[78] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[79] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[80] Aleksander Madry,et al. Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[81] Sandy H. Huang,et al. Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[82] Dawn Xiaodong Song,et al. Exploring the Space of Black-box Attacks on Deep Neural Networks , 2017, ArXiv.

[83] Fabio Roli,et al. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[84] Harini Kannan,et al. Adversarial Logit Pairing , 2018, NIPS 2018.

[85] David A. Wagner,et al. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[86] Terrance E. Boult,et al. Are Accuracy and Robustness Correlated , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[87] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[88] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[89] Surya Ganguli,et al. Biologically inspired protection of deep networks from adversarial attacks , 2017, ArXiv.

[90] Prateek Mittal,et al. Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers , 2017, ArXiv.

[91] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[92] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[93] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[94] Pin-Yu Chen,et al. Attacking the Madry Defense Model with L1-based Adversarial Examples , 2017, ICLR.

[95] Fan Zhang,et al. Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[96] Jun Zhu,et al. Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[97] Dale Schuurmans,et al. Learning with a Strong Adversary , 2015, ArXiv.

[98] Uri Shaham,et al. Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[99] Leon Sixt,et al. RenderGAN: Generating Realistic Labeled Data , 2016, Front. Robot. AI.

[100] Dawn Xiaodong Song,et al. Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[101] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[102] Dylan Hadfield-Menell,et al. On the Geometry of Adversarial Examples , 2018, ArXiv.

[103] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[104] Bernhard Schölkopf,et al. Adversarial Vulnerability of Neural Networks Increases With Input Dimension , 2018, ArXiv.

[105] Colin Raffel,et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[106] Sameer Singh,et al. Generating Natural Adversarial Examples , 2017, ICLR.

[107] Nina Narodytska,et al. Simple Black-Box Adversarial Attacks on Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[108] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[109] Kouichi Sakurai,et al. One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.