Towards Deep Learning Models Resistant to Adversarial Attacks

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at this https URL and this https URL.

[1]  A. Wald Contributions to the Theory of Statistical Estimation and Testing Hypotheses , 1939 .

[2]  A. Wald Statistical Decision Functions Which Minimize the Maximum Risk , 1945 .

[3]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[4]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[5]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[6]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[9]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[10]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[11]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[13]  Kaizhu Huang,et al.  A Unified Gradient Regularization Family for Adversarial Examples , 2015, 2015 IEEE International Conference on Data Mining.

[14]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Dale Schuurmans,et al.  Learning with a Strong Adversary , 2015, ArXiv.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Mohamad Ali Torkamani,et al.  Robust Large Margin Approaches for Machine Learning in Adversarial Settings , 2016 .

[20]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[21]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[22]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[23]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[24]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[25]  Patrick D. McDaniel,et al.  On the Effectiveness of Defensive Distillation , 2016, ArXiv.

[26]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[27]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[28]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[29]  David L. Dill,et al.  Ground-Truth Adversarial Examples , 2017, ArXiv.

[30]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[32]  Pascal Frossard,et al.  Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[33]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[34]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[35]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[36]  Uri Shaham,et al.  Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[37]  Terrance E. Boult,et al.  Towards Robust Deep Neural Networks with BANG , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Lawrence Carin,et al.  Second-Order Adversarial Attack and Certifiable Robustness , 2018, ArXiv.

[39]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[40]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.