Metrics and methods for robustness evaluation of neural networks with generative models

Recent studies have shown that modern deep neural network classifiers are easy to fool, assuming that an adversary is able to slightly modify their inputs. Many papers have proposed adversarial attacks, defenses and methods to measure robustness to such adversarial perturbations. However, most commonly considered adversarial examples are based on $\ell_p$-bounded perturbations in the input space of the neural network, which are unlikely to arise naturally. Recently, especially in computer vision, researchers discovered "natural" or "semantic" perturbations, such as rotations, changes of brightness, or more high-level changes, but these perturbations have not yet been systematically utilized to measure the performance of classifiers. In this paper, we propose several metrics to measure robustness of classifiers to natural adversarial examples, and methods to evaluate them. These metrics, called latent space performance metrics, are based on the ability of generative models to capture probability distributions, and are defined in their latent spaces. On three image classification case studies, we evaluate the proposed metrics for several classifiers, including ones trained in conventional and robust ways. We find that the latent counterparts of adversarial robustness are associated with the accuracy of the classifier rather than its conventional adversarial robustness, but the latter is still reflected on the properties of found latent perturbations. In addition, our novel method of finding latent adversarial perturbations demonstrates that these perturbations are often perceptually small.

[1]  Ousmane Amadou Dia,et al.  Manifold Preserving Adversarial Learning , 2019, ArXiv.

[2]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[4]  Ashish Tiwari,et al.  Output Range Analysis for Deep Neural Networks , 2017, ArXiv.

[5]  Isil Dillig,et al.  Optimization and abstraction: a synergistic approach for analyzing neural network robustness , 2019, PLDI.

[6]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[9]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[10]  Matthew Mirman,et al.  Robustness Certification of Generative Models , 2020, ArXiv.

[11]  Aleksander Madry,et al.  Image Synthesis with a Single (Robust) Classifier , 2019, NeurIPS.

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[14]  Bolei Zhou,et al.  Seeing What a GAN Cannot Generate , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Pascal Frossard,et al.  Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[16]  Liang Zhao,et al.  Interpreting and Evaluating Neural Network Robustness , 2019, IJCAI.

[17]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[18]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[19]  Corina S. Pasareanu,et al.  DeepSafe: A Data-Driven Approach for Assessing Robustness of Neural Networks , 2018, ATVA.

[20]  Yang Song,et al.  Constructing Unrestricted Adversarial Examples with Generative Models , 2018, NeurIPS.

[21]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[22]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[23]  Corina S. Pasareanu,et al.  DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks , 2017, ArXiv.

[24]  Ian J. Goodfellow Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size , 2018, ArXiv.

[25]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[26]  Arno Solin,et al.  Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Quoc V. Le,et al.  Using Videos to Evaluate Image Model Robustness , 2019, ArXiv.

[28]  Mykel J. Kochenderfer,et al.  The Marabou Framework for Verification and Analysis of Deep Neural Networks , 2019, CAV.

[29]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[30]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[31]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[32]  Sijia Liu,et al.  CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks , 2018, AAAI.

[33]  Justin Emile Gottschlich,et al.  An Abstraction-Based Framework for Neural Network Verification , 2019, CAV.

[34]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[35]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[36]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[37]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[38]  Timon Gehr,et al.  An abstract domain for certifying neural networks , 2019, Proc. ACM Program. Lang..

[39]  Aleksander Madry,et al.  Computer Vision with a Single (Robust) Classifier , 2019, NeurIPS 2019.

[40]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[41]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[42]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[44]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[45]  Arno Solin,et al.  Pioneer Networks: Progressively Growing Generative Autoencoder , 2018, ACCV.

[46]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[48]  Somesh Jha,et al.  Semantic Adversarial Deep Learning , 2018, IEEE Design & Test.

[49]  Ashish Tiwari,et al.  Output Range Analysis for Deep Feedforward Neural Networks , 2018, NFM.

[50]  Ryan P. Adams,et al.  Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[51]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[52]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[53]  Xiaowei Huang,et al.  Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[54]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[55]  Ousmane Amadou Dia,et al.  Semantics Preserving Adversarial Attacks , 2019 .

[56]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.