Adversarial vulnerability for any classifier

Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk. Our analysis of the robustness also provides insights onto key properties of generative models, such as their smoothness and dimensionality of latent space. We conclude with numerical experimental results showing that our bounds provide informative baselines to the maximal achievable robustness on several datasets.

[1]  C. Borell The Brunn-Minkowski inequality in Gauss space , 1975 .

[2]  V. Sudakov,et al.  Extremal properties of half-spaces for spherically invariant measures , 1978 .

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  L. Duembgen Bounding Standard Gaussian Tail Probabilities , 2010, 1012.2063.

[7]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[8]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[11]  Pierre Baldi,et al.  Deep autoencoder neural networks for gene ontology annotation predictions , 2014, BCB.

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[14]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[15]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16]  Uri Shaham,et al.  Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[17]  Jianlin Cheng,et al.  A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Eduardo Valle,et al.  Adversarial Images for Variational Autoencoders , 2016, ArXiv.

[22]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[24]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[25]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[26]  Lewis D. Griffin,et al.  A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples , 2016, ArXiv.

[27]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[28]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[29]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[30]  Yvan Saeys,et al.  Lower bounds on the robustness to adversarial perturbations , 2017, NIPS.

[31]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[32]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[33]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[34]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[35]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[36]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[37]  Pascal Frossard,et al.  Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[38]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[39]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[40]  Pushmeet Kohli,et al.  A Dual Approach to Scalable Verification of Deep Networks , 2018, UAI.

[41]  Dawn Xiaodong Song,et al.  Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[42]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[43]  Jascha Sohl-Dickstein,et al.  Adversarial Examples that Fool both Human and Computer Vision , 2018, ArXiv.

[44]  Martin Wattenberg,et al.  Adversarial Spheres , 2018, ICLR.

[45]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[46]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.