Fast Certified Robust Training with Short Warmup

State-of-the-art (SOTA) methods for certified robust training including interval bound propagation (IBP) and CROWN-IBP usually use a long warmup schedule with hundreds or thousands epochs and are thus costly. In this paper, we identify two important issues, namely exploded bounds at initialization, and the imbalance in ReLU activation states, which make certified training difficult and unstable, and thereby long warmup was previously needed. For fast training with short warmup, we propose three improvements, including a weight initialization for IBP training, fully adding Batch Normalization (BN), and regularization during warmup to tighten certified bounds and balance ReLU activation states. With a short warmup for fast training, we are already able to outperform literature SOTA trained with hundreds or thousands epochs under the same network architecture.

[1]  Pushmeet Kohli,et al.  Piecewise Linear Neural Network verification: A comparative study , 2017, ArXiv.

[2]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[3]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Lu Lu,et al.  Dying ReLU and Initialization: Theory and Numerical Examples , 2019, Communications in Computational Physics.

[7]  Pushmeet Kohli,et al.  Training verified learners with learned verifiers , 2018, ArXiv.

[8]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[9]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[10]  Avrim Blum,et al.  Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[11]  Maksims Volkovs,et al.  Improving Transformer Optimization Through Better Initialization , 2020, ICML.

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Maximilian Baader,et al.  Certified Defenses: Why Tighter Relaxations May Hurt Training? , 2021, ArXiv.

[14]  Twan van Laarhoven,et al.  L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.

[15]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[16]  Cho-Jui Hsieh,et al.  Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[17]  Cho-Jui Hsieh,et al.  Evaluating Robustness of Deep Image Super-Resolution Against Adversarial Attacks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Cho-Jui Hsieh,et al.  Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond , 2020, NeurIPS.

[19]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[22]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[23]  Jinfeng Yi,et al.  Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Matthew Mirman,et al.  Fast and Effective Robustness Certification , 2018, NeurIPS.

[26]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[27]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[28]  Yizheng Chen,et al.  MixTrain: Scalable Training of Formally Robust Neural Networks , 2018, ArXiv.

[29]  Cho-Jui Hsieh,et al.  Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Complete and Incomplete Neural Network Verification , 2021, ArXiv.

[30]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[31]  Aleksander Madry,et al.  Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability , 2018, ICLR.

[32]  Masato Taki,et al.  Deep Residual Networks and Weight Initialization , 2017, ArXiv.

[33]  Ilya P. Razenshteyn,et al.  Randomized Smoothing of All Shapes and Sizes , 2020, ICML.

[34]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[35]  David Rolnick,et al.  How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.

[36]  Liwei Wang,et al.  Towards Certifying `∞ Robustness using Neural Networks with `∞-dist Neurons , 2021 .

[37]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[40]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[41]  Tom Goldstein,et al.  GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training , 2021, NeurIPS.

[42]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[44]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[45]  Yoshua Bengio,et al.  How to Initialize your Network? Robust Initialization for WeightNorm & ResNets , 2019, NeurIPS.

[46]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  LOSS LANDSCAPE MATTERS: TRAINING CERTIFIABLY ROBUST MODELS WITH FAVORABLE LOSS LAND- , 2020 .

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Timon Gehr,et al.  An abstract domain for certifying neural networks , 2019, Proc. ACM Program. Lang..

[50]  Dahua Lin,et al.  Towards Evaluating and Training Verifiably Robust Neural Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Mislav Balunovic,et al.  Adversarial Training and Provable Defenses: Bridging the Gap , 2020, ICLR.

[52]  Jinfeng Yi,et al.  Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning , 2017, ACL.

[53]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[54]  Tom Goldstein,et al.  Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness , 2020, ICML.

[55]  Junfeng Yang,et al.  Efficient Formal Safety Analysis of Neural Networks , 2018, NeurIPS.

[56]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.