Enhancing Adversarial Defense by k-Winners-Take-All

We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model's gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.

[1]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[2]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[4]  Xin Yang,et al.  Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound , 2019, ArXiv.

[5]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[6]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[7]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[10]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[13]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[14]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[15]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[16]  Nina Narodytska,et al.  Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[20]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[21]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[24]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[25]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[26]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[28]  Toon Goedemé,et al.  Fooling Automated Surveillance Cameras: Adversarial Patches to Attack Person Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[30]  S. Grossberg Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .

[31]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[32]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[33]  Yaser S. Abu-Mostafa,et al.  On the K-Winners-Take-All Network , 1988, NIPS.

[34]  Yuanzhi Li,et al.  On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.

[35]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[36]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[37]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[38]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[39]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[40]  Jürgen Schmidhuber,et al.  Understanding Locally Competitive Networks , 2014, ICLR.

[41]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[42]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[43]  Michael I. Jordan,et al.  HopSkipJumpAttack: A Query-Efficient Decision-Based Attack , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[44]  R. Douglas,et al.  Neuronal circuits of the neocortex. , 2004, Annual review of neuroscience.

[45]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[46]  Jinfeng Yi,et al.  Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[47]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[48]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[49]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[50]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[51]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[52]  Dandelion Mané,et al.  DEFENSIVE QUANTIZATION: WHEN EFFICIENCY MEETS ROBUSTNESS , 2018 .

[53]  Kevan A. C. Martin,et al.  A Canonical Microcircuit for Neocortex , 1989, Neural Computation.

[54]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[55]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[56]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[57]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[58]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[59]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[60]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[61]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[62]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[63]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[64]  Wolfgang Maass,et al.  Neural Computation with Winner-Take-All as the Only Nonlinear Operation , 1999, NIPS.

[65]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.