Are Perceptually-Aligned Gradients a General Property of Robust Classifiers?

For a standard convolutional neural network, optimizing over the input pixels to maximize the score of some target class will generally produce a grainy-looking version of the original image. However, Santurkar et al. (2019) demonstrated that for adversarially-trained neural networks, this optimization produces images that uncannily resemble the target class. In this paper, we show that these "perceptually-aligned gradients" also occur under randomized smoothing, an alternative means of constructing adversarially-robust classifiers. Our finding supports the hypothesis that perceptually-aligned gradients may be a general property of robust classifiers. We hope that our results will inspire research aimed at explaining this link between perceptually-aligned gradients and adversarial robustness.

[1]  Yuchen Zhang,et al.  Defending against Whitebox Adversarial Attacks via Randomized Discretization , 2019, AISTATS.

[2]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[3]  Xiaoyu Cao,et al.  Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification , 2017, ACSAC.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[6]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[7]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[8]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[9]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[10]  Lawrence Carin,et al.  Certified Adversarial Robustness with Additive Gaussian Noise , 2018, NeurIPS 2019.

[11]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[12]  Hisashi Kashima,et al.  Theoretical evidence for adversarial robustness through randomization: the case of the Exponential family , 2019, NeurIPS.

[13]  Cho-Jui Hsieh,et al.  A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks , 2019, NeurIPS.

[14]  Alexander Levine,et al.  Certifiably Robust Interpretation in Deep Learning , 2019, ArXiv.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[17]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[18]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[19]  Aleksander Madry,et al.  Image Synthesis with a Single (Robust) Classifier , 2019, NeurIPS.

[20]  Matthias Bethge,et al.  Accurate, reliable and fast robustness evaluation , 2019, NeurIPS.

[21]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[23]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[24]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  James Kuelbs,et al.  Some Shift Inequalities for Gaussian Measures , 1998 .

[27]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.