论文信息 - Enhancing Adversarial Defense by k-Winners-Take-All

Enhancing Adversarial Defense by k-Winners-Take-All

We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model's gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.

Peilin Zhong | Chang Xiao | Changxi Zheng

[1] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[2] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Kamyar Azizzadenesheli,et al. Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[4] Xin Yang,et al. Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound , 2019, ArXiv.

[5] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.

[6] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[7] Jun Zhu,et al. Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Colin Raffel,et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[10] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[13] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[14] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[15] Kouichi Sakurai,et al. One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[16] Nina Narodytska,et al. Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[17] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[20] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[21] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Blaine Nelson,et al. The security of machine learning , 2010, Machine Learning.

[24] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[25] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[26] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[28] Toon Goedemé,et al. Fooling Automated Surveillance Cameras: Adversarial Patches to Attack Person Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29] Matthias Bethge,et al. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[30] S. Grossberg. Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .

[31] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[32] Jürgen Schmidhuber,et al. Compete to Compute , 2013, NIPS.

[33] Yaser S. Abu-Mostafa,et al. On the K-Winners-Take-All Network , 1988, NIPS.

[34] Yuanzhi Li,et al. On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.

[35] Tom Goldstein,et al. Are adversarial examples inevitable? , 2018, ICLR.

[36] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[37] Alan L. Yuille,et al. Mitigating adversarial effects through randomization , 2017, ICLR.

[38] Blaine Nelson,et al. Can machine learning be secure? , 2006, ASIACCS '06.

[39] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[40] Jürgen Schmidhuber,et al. Understanding Locally Competitive Networks , 2014, ICLR.

[41] W. Brendel,et al. Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[42] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[43] Michael I. Jordan,et al. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[44] R. Douglas,et al. Neuronal circuits of the neocortex. , 2004, Annual review of neuroscience.

[45] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[46] Jinfeng Yi,et al. Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[47] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[48] Larry S. Davis,et al. Adversarial Training for Free! , 2019, NeurIPS.

[49] Lujo Bauer,et al. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[50] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[51] Logan Engstrom,et al. Synthesizing Robust Adversarial Examples , 2017, ICML.

[52] Dandelion Mané,et al. DEFENSIVE QUANTIZATION: WHEN EFFICIENCY MEETS ROBUSTNESS , 2018 .

[53] Kevan A. C. Martin,et al. A Canonical Microcircuit for Neocortex , 1989, Neural Computation.

[54] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[55] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[56] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[57] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[58] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[59] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[60] James Bailey,et al. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[61] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[62] Wolfgang Maass,et al. On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[63] Matthias Bethge,et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[64] Wolfgang Maass,et al. Neural Computation with Winner-Take-All as the Only Nonlinear Operation , 1999, NIPS.

[65] Inderjit S. Dhillon,et al. Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.