You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle

Deep learning achieves state-of-the-art results in many tasks in computer vision and natural language processing. However, recent works have shown that deep networks can be vulnerable to adversarial perturbations, which raised a serious robustness issue of deep networks. Adversarial training, typically formulated as a robust optimization problem, is an effective way of improving the robustness of deep networks. A major drawback of existing adversarial training algorithms is the computational overhead of the generation of adversarial examples, typically far greater than that of the network training. This leads to the unbearable overall computational cost of adversarial training. In this paper, we show that adversarial training can be cast as a discrete time differential game. Through analyzing the Pontryagin's Maximal Principle (PMP) of the problem, we observe that the adversary update is only coupled with the parameters of the first layer of the network. This inspires us to restrict most of the forward and back propagation within the first layer of the network during adversary updates. This effectively reduces the total number of full forward and backward propagation to only one for each group of adversary updates. Therefore, we refer to this algorithm YOPO (You Only Propagate Once). Numerical experiments demonstrate that YOPO can achieve comparable defense accuracy with approximately 1/5 ~ 1/4 GPU time of the projected gradient descent (PGD) algorithm. Our codes are available at https://https://github.com/a1600012888/YOPO-You-Only-Propagate-Once.

[1]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[2]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[3]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[4]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[5]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[6]  Jia Li,et al.  Lifted Proximal Operator Machines , 2018, AAAI.

[7]  Raja Giryes,et al.  Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[8]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[9]  Stephan Günnemann,et al.  Adversarial Attacks on Neural Networks for Graph Data , 2018, KDD.

[10]  Liwei Wang,et al.  RANDOM MASK: Towards Robust Convolutional Neural Networks , 2020, ArXiv.

[11]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[12]  Yizhou Wang,et al.  Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds , 2019, ICLR.

[13]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[14]  Zhanxing Zhu,et al.  Bayesian Adversarial Learning , 2018, NeurIPS.

[15]  R. V. Gamkrelidze,et al.  THE THEORY OF OPTIMAL PROCESSES. I. THE MAXIMUM PRINCIPLE , 1960 .

[16]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[17]  Qianxiao Li,et al.  An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks , 2018, ICML.

[18]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Bin Dong,et al.  Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration , 2018, ICLR.

[20]  Murat A. Erdogdu,et al.  Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond , 2019, NeurIPS.

[21]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[22]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[23]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[24]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Mohan S. Kankanhalli,et al.  Towards Robust ResNet: A Small Step but A Giant Leap , 2019, IJCAI.

[26]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Bin Gu,et al.  Decoupled Parallel Backpropagation with Convergence Guarantee , 2018, ICML.

[28]  Haifeng Qian,et al.  L2-Nonexpansive Neural Networks , 2018, ICLR.

[29]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[30]  Laurent El Ghaoui,et al.  Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training , 2018, AISTATS.

[31]  H. Halkin A Maximum Principle of the Pontryagin Type for Systems Described by Nonlinear Difference Equations , 1966 .

[32]  W. E A Proposal on Machine Learning via Dynamical Systems , 2017 .

[33]  Dandelion Mané,et al.  DEFENSIVE QUANTIZATION: WHEN EFFICIENCY MEETS ROBUSTNESS , 2018 .

[34]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[35]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[36]  Noboru Murata,et al.  Transport Analysis of Infinitely Deep Neural Network , 2016, J. Mach. Learn. Res..

[37]  M. Thorpe,et al.  Deep limits of residual neural networks , 2018, Research in the Mathematical Sciences.

[38]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[39]  A. Wald Contributions to the Theory of Statistical Estimation and Testing Hypotheses , 1939 .

[40]  Leonidas J. Guibas,et al.  PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks , 2018, ICLR.

[41]  Zhanxing Zhu,et al.  Enhancing the Robustness of Deep Neural Networks by Boundary Conditional GAN , 2019, ArXiv.

[42]  Bo Ren,et al.  Fluid directed rigid body control using deep reinforcement learning , 2018, ACM Trans. Graph..

[43]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[44]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[45]  E Weinan,et al.  A mean-field optimal control formulation of deep learning , 2018, Research in the Mathematical Sciences.

[46]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[47]  Stanley Osher,et al.  EnResNet: ResNet Ensemble via the Feynman-Kac Formalism , 2018, ArXiv.