Understanding Robustness in Teacher-Student Setting: A New Perspective

Adversarial examples have appeared as a ubiquitous property of machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Such examples provide a way to assess the robustness of machine learning models as well as a proxy for understanding the model training process. There have been extensive studies trying to explain the existence of adversarial examples and provide ways to improve model robustness, e.g., adversarial training. Different from prior works that mostly focus on models trained on datasets with predefined labels, we leverage the teacher-student framework and assume a teacher model, or oracle, to provide the labels for given instances. In this setting, we extend Tian (2019) in the case of low-rank input data, and show that student specialization (the trained student neuron is highly correlated with certain teacher neuron at the same layer) still happens within the input subspace, but the teacher and student nodes could differ wildly out of the data subspace, which we conjecture leads to adversarial examples. Extensive experiments show that student specialization correlates strongly with model robustness in different scenarios, including students trained via standard training, adversarial training, confidence-calibrated adversarial training, and training with the robust feature dataset. Our studies could shed light on the future exploration of adversarial examples, and potential approaches to enhance model robustness via principled data augmentation. Proceedings of the 24 International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130. Copyright 2021 by the author(s).

[1]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[2]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[3]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[4]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[5]  Yuandong Tian,et al.  Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension , 2020, ICML.

[6]  Abhishek Sinha,et al.  Understanding Adversarial Space Through the Lens of Attribution , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[7]  Volker Tresp,et al.  Saliency Methods for Explaining Adversarial Attacks , 2019, ArXiv.

[8]  Nicolas Macris,et al.  The committee machine: computational to statistical gaps in learning a two-layers neural network , 2018, NeurIPS.

[9]  Bernt Schiele,et al.  Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks , 2019, ICML.

[10]  E. Gardner,et al.  Three unfinished works on the optimal storage capacity of networks , 1989 .

[11]  A. Adam Ding,et al.  Understanding and Quantifying Adversarial Examples Existence in Linear Classification , 2019, ArXiv.

[12]  Quoc V. Le,et al.  Smooth Adversarial Training , 2020, ArXiv.

[13]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[14]  David Saad,et al.  Online Learning in Radial Basis Function Networks , 1997, Neural Computation.

[15]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[16]  Danilo Vasconcellos Vargas,et al.  Understanding the One Pixel Attack: Propagation Maps and Locality Analysis , 2019, AISafety@IJCAI.

[17]  Adi Shamir,et al.  A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance , 2019, ArXiv.

[18]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[19]  Yi Sun,et al.  Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.

[20]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[21]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[22]  Shashank Kotyan,et al.  Representation Quality Of Neural Networks Links To Adversarial Attacks and Defences , 2019 .

[23]  Bo Li,et al.  Big but Imperceptible Adversarial Perturbations via Semantic Manipulation , 2019, ArXiv.

[24]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[25]  LIII , 2018, Out of the Shadow.

[26]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[27]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[28]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[29]  Dylan Hadfield-Menell,et al.  On the Geometry of Adversarial Examples , 2018, ArXiv.

[30]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[31]  Eduardo Valle,et al.  Exploring the space of adversarial images , 2015, 2016 International Joint Conference on Neural Networks (IJCNN).

[32]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[33]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[34]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[35]  Anthony C. C. Coolen,et al.  Statistical mechanical analysis of the dynamics of learning in perceptrons , 1997, Stat. Comput..

[36]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[37]  Dawn Xiaodong Song,et al.  Exploring the Space of Black-box Attacks on Deep Neural Networks , 2017, ArXiv.

[38]  Hao Chen,et al.  Explore the Transformation Space for Adversarial Images , 2020, CODASPY.

[39]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[40]  Yuandong Tian,et al.  Luck Matters: Understanding Training Dynamics of Deep ReLU Networks , 2019, ArXiv.

[41]  David Saad,et al.  Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks , 1995, NIPS.

[42]  Inderjit S. Dhillon,et al.  The Limitations of Adversarial Training and the Blind-Spot Attack , 2019, ICLR.