Adversarial Robustness on In- and Out-Distribution Improves Explainability

Neural networks have led to major improvements in image classification but suffer from being non-robust to adversarial changes, unreliable uncertainty estimates on out-distribution samples and their inscrutable black-box decisions. In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. RATIO has similar generative properties to adversarial training so that visual counterfactuals produce class specific features. While adversarial training comes at the price of lower clean accuracy, RATIO achieves state-of-the-art $l_2$-adversarial robustness on CIFAR10 and maintains better clean accuracy.

[1]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[2]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[3]  Prateek Mittal,et al.  Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples , 2019, ArXiv.

[4]  Matthias Hein,et al.  Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data , 2020, ArXiv.

[5]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[6]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[7]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[8]  Lawrence Carin,et al.  Second-Order Adversarial Attack and Certifiable Robustness , 2018, ArXiv.

[9]  Benjamin Recht,et al.  Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.

[10]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[11]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[12]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[13]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[14]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[15]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[16]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[17]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[19]  Hang Su,et al.  Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples , 2017, ArXiv.

[20]  Matthias Hein,et al.  Certifiably Adversarially Robust Detection of Out-of-Distribution Data , 2020, NeurIPS.

[21]  Amir Najafi,et al.  Robustness to Adversarial Perturbations in Learning from Incomplete Data , 2019, NeurIPS.

[22]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[23]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[24]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[25]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Lei Zhang,et al.  CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[28]  Matthias Hein,et al.  Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[29]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2020, ECCV.

[30]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[31]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[33]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  J. Zico Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[35]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[36]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[37]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[38]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[39]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[40]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[41]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[42]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[43]  Luiz Eduardo Soares de Oliveira,et al.  Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Dan Boneh,et al.  AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning , 2019, CCS.

[45]  Solon Barocas,et al.  The hidden assumptions behind counterfactual explanations and principal reasons , 2019, FAT*.

[46]  A Subiel,et al.  An ultra-high gain and efficient amplifier based on Raman amplification in plasma , 2017, Scientific Reports.

[47]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[48]  Aleksander Madry,et al.  Computer Vision with a Single (Robust) Classifier , 2019, NeurIPS 2019.

[49]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[50]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[51]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[52]  Jordi Vitrià,et al.  Explaining Visual Models by Causal Attribution , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[53]  Po-Sen Huang,et al.  Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[54]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[55]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[56]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[57]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[58]  Bernt Schiele,et al.  Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[60]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[61]  Matthias Hein,et al.  Logit Pairing Methods Can Fool Gradient-Based Attacks , 2018, ArXiv.

[62]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[63]  Pouya Samangouei,et al.  ExplainGAN: Model Explanation via Decision Boundary Crossing Transformations , 2018, ECCV.

[64]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[65]  Trevor Darrell,et al.  Grounding Visual Explanations , 2018, ECCV.

[66]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[67]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[68]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[69]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[70]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[71]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[72]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[73]  Matthias Hein,et al.  Towards neural networks that provably know when they don't know , 2020, ICLR.