Interpreting Adversarial Robustness: A View from Decision Surface in Input Space

One popular hypothesis of neural network generalization is that the flat local minima of loss surface in parameter space leads to good generalization. However, we demonstrate that loss surface in parameter space has no obvious relationship with generalization, especially under adversarial settings. Through visualizing decision surfaces in both parameter space and input space, we instead show that the geometry property of decision surface in input space correlates well with the adversarial robustness. We then propose an adversarial robustness indicator, which can evaluate a neural network's intrinsic robustness property without testing its accuracy under adversarial attacks. Guided by it, we further propose our robust training method. Without involving adversarial training, our method could enhance network's intrinsic adversarial robustness against various adversarial attacks.

[1]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[3]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[4]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[5]  Daniel Jiwoong Im,et al.  An empirical analysis of the optimization of deep network loss surfaces , 2016, 1612.04010.

[6]  Y. Le Cun,et al.  Double backpropagation increasing generalization performance , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[7]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[8]  Daniel Kifer,et al.  Unifying Adversarial Training Algorithms with Flexible Deep Data Gradient Regularization , 2016, ArXiv.

[9]  Daniel Jiwoong Im,et al.  An Empirical Analysis of Deep Network Loss Surfaces , 2016, ArXiv.

[10]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[11]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[12]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[13]  Razvan Pascanu,et al.  Sharp Minima Can Generalize For Deep Nets , 2017, ICML.

[14]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[15]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[16]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[17]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[18]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[20]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[21]  Nicolas Le Roux,et al.  Negative eigenvalues of the Hessian in deep neural networks , 2018, ICLR.

[22]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[23]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[24]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.