What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

Deep neural networks (DNNs) are increasingly deployed in different applications to achieve stateof-the-art performance. However, they are often applied as a black box with limited understanding of what knowledge the model has learned from the data. In this paper, we focus on image classification and propose a method to visualize and understand the class-wise knowledge (patterns) learned by DNNs under three different settings including natural, backdoor and adversarial. Different to existing visualization methods, our method searches for a single predictive pattern in the pixel space to represent the knowledge learned by the model for each class. Based on the proposed method, we show that DNNs trained on natural (clean) data learn abstract shapes along with some texture, and backdoored models learn a suspicious pattern for the backdoored class. Interestingly, the phenomenon that DNNs can learn a single predictive pattern for each class indicates that DNNs can learn a backdoor even from clean data, and the pattern itself is a backdoor trigger. In the adversarial setting, we show that adversarially trained models tend to learn more simplified shape patterns. Our method can serve as a useful tool to better understand the knowledge learned by DNNs on different datasets under different settings.

[1]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2]  James Bailey,et al.  Clean-Label Backdoor Attacks on Video Recognition Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  James Bailey,et al.  Adversarial Camouflage: Hiding Physical-World Attacks With Natural Styles , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andreas Dengel,et al.  What do Deep Networks Like to See? , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ken-ichi Kawarabayashi,et al.  Estimating Local Intrinsic Dimensionality , 2015, KDD.

[8]  Yunfei Liu,et al.  Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks , 2020, ECCV.

[9]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[10]  Chuang Gan,et al.  Interpreting Adversarial Examples by Activation Promotion and Suppression , 2019, ArXiv.

[11]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[12]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[14]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Yan Feng,et al.  Hilbert-Based Generative Defense for Adversarial Examples , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Graham W. Taylor,et al.  Batch Normalization is a Cause of Adversarial Vulnerability , 2019, ArXiv.

[17]  Michael E. Houle,et al.  Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications , 2017, SISAP.

[18]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[19]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[20]  Michael E. Houle,et al.  Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support , 2017, SISAP.

[21]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[22]  Zhiheng Huang,et al.  Residual Convolutional CTC Networks for Automatic Speech Recognition , 2017, ArXiv.

[23]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[24]  Shu-Tao Xia,et al.  Improving Adversarial Robustness via Channel-wise Activation Suppressing , 2021, ICLR.

[25]  Bernhard Schölkopf,et al.  Adversarial Vulnerability of Neural Networks Increases With Input Dimension , 2018, ArXiv.

[26]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[27]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[28]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[29]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[30]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  James Bailey,et al.  On the Convergence and Robustness of Adversarial Training , 2021, ICML.

[32]  Lei Wu,et al.  Understanding and Enhancing the Transferability of Adversarial Examples , 2018, ArXiv.

[33]  Yukun Yang,et al.  Defending Neural Backdoors via Generative Distribution Modeling , 2019, NeurIPS.

[34]  Quanshi Zhang,et al.  A Unified Approach to Interpreting and Boosting Adversarial Transferability , 2020, ICLR.

[35]  James Bailey,et al.  Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems , 2019, Pattern Recognit..

[36]  James Bailey,et al.  Black-box Adversarial Attacks on Video Recognition Models , 2019, ACM Multimedia.

[37]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[38]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[39]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[40]  Xingjun Ma,et al.  Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness , 2020, ArXiv.

[41]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[44]  Eric P. Xing,et al.  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[46]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[47]  Shu-Tao Xia,et al.  Improving Query Efficiency of Black-box Adversarial Attack , 2020, ECCV.

[48]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization , 2016 .

[49]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Alessandro Laio,et al.  Intrinsic dimension of data representations in deep neural networks , 2019, NeurIPS.

[51]  James Bailey,et al.  Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets , 2020, ICLR.

[52]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[54]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[55]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[56]  Thomas Brox,et al.  Universal Adversarial Perturbations Against Semantic Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[58]  Yisen Wang,et al.  Adversarial Weight Perturbation Helps Robust Generalization , 2020, NeurIPS.

[59]  Hamed Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[60]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[61]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[62]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[63]  Ben Y. Zhao,et al.  Latent Backdoor Attacks on Deep Neural Networks , 2019, CCS.

[64]  Qian Huang,et al.  Enhancing Adversarial Example Transferability With an Intermediate Level Attack , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.