论文信息 - The shape and simplicity biases of adversarially robust ImageNet-trained CNNs.

The shape and simplicity biases of adversarially robust ImageNet-trained CNNs.

Adversarial training has been the topic of dozens of studies and a leading method for defending against adversarial attacks. Yet, it remains largely unknown (a) how adversarially-robust ImageNet classifiers (R classifiers) generalize to out-of-distribution examples; and (b) how their generalization capability relates to their hidden representations. In this paper, we perform a thorough, systematic study to answer these two questions across AlexNet, GoogLeNet, and ResNet-50 architectures. We found that while standard ImageNet classifiers have a strong texture bias, their R counterparts rely heavily on shapes. Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of 'robustifying' the network. That is, each convolutional neuron in R networks often changes to detecting (1) pixel-wise smoother patterns i.e. a mechanism that blocks high-frequency noise from passing through the network; (2) more lower-level features i.e. textures and colors (instead of objects); and (3) fewer types of inputs. Our findings reveal the interesting mechanisms that made networks more adversarially robust and also explain some recent findings.

Chirag Agarwal | Anh Nguyen | Peijie Chen

[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Vittorio Ferrari,et al. COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[6] Ajmal Mian,et al. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[7] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Yoshua Bengio,et al. On the Spectral Bias of Neural Networks , 2018, ICML.

[9] Terrance E. Boult,et al. Improved Adversarial Robustness by Reducing Open Space Risk via Tent Activations , 2019, ArXiv.

[10] Matthias Bethge,et al. Generalisation in humans and deep neural networks , 2018, NeurIPS.

[11] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.

[12] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[15] Matthias Bethge,et al. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[16] L. Rudin,et al. Nonlinear total variation based noise removal algorithms , 1992 .

[17] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[18] Chico Q. Camargo,et al. Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.

[19] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[20] Zhanxing Zhu,et al. Interpreting Adversarially Trained Convolutional Neural Networks , 2019, ICML.

[21] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22] Alan L. Yuille,et al. Intriguing Properties of Adversarial Training at Scale , 2020, ICLR.

[23] Quoc V. Le,et al. Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Anh Nguyen,et al. SAM: The Sensitivity of Attribution Methods to Hyperparameters , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Jason Yosinski,et al. Understanding Neural Networks via Feature Visualization: A survey , 2019, Explainable AI.

[26] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Thomas Brox,et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[28] Seth Lloyd,et al. Deep neural networks are biased towards simple functions , 2018, ArXiv.

[29] Aleksander Madry,et al. Image Synthesis with a Single (Robust) Classifier , 2019, NeurIPS.

[30] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[31] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.