Boosting Adversarial Training with Hypersphere Embedding

Adversarial training (AT) is one of the most effective defenses against adversarial attacks for deep learning models. In this work, we advocate incorporating the hypersphere embedding (HE) mechanism into the AT procedure by regularizing the features onto compact manifolds, which constitutes a lightweight yet effective module to blend in the strength of representation learning. Our extensive analyses reveal that AT and HE are well coupled to benefit the robustness of the adversarially trained models from several aspects. We validate the effectiveness and adaptability of HE by embedding it into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as the FreeAT and FastAT strategies. In the experiments, we evaluate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets, which verifies that integrating HE can consistently enhance the model robustness for each AT framework with little extra computation.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  R. Venkatesh Babu,et al.  Single-Step Adversarial Training With Dropout Scheduling , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[4]  James Bailey,et al.  On the Convergence and Robustness of Adversarial Training , 2021, ICML.

[5]  Po-Sen Huang,et al.  An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[6]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[7]  Aleksander Madry,et al.  Image Synthesis with a Single (Robust) Classifier , 2019, NeurIPS.

[8]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[9]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[10]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[11]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[12]  Aleksander Madry,et al.  A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations , 2017, ArXiv.

[13]  Haichao Zhang,et al.  Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[14]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[15]  J. Zico Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[16]  Jun Zhu,et al.  Max-Mahalanobis Linear Discriminant Analysis Networks , 2018, ICML.

[17]  Lijun Zhang,et al.  Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss , 2019, IJCAI.

[18]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[19]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[20]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[25]  Jian Cheng,et al.  NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[26]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[27]  Baishakhi Ray,et al.  Metric Learning for Adversarial Robustness , 2019, NeurIPS.

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[30]  Kun He,et al.  Improving the Generalization of Adversarial Training with Domain Adaptation , 2018, ICLR.

[31]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Tom Goldstein,et al.  FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.

[33]  Logan Engstrom,et al.  Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.

[34]  Wanxiang Che,et al.  Improving Machine Reading Comprehension via Adversarial Training , 2019, ArXiv.

[35]  Bin Dong,et al.  You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[36]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[37]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[38]  Hang Su,et al.  Bag of Tricks for Adversarial Training , 2020, ICLR.

[39]  Hengshuang Zhao,et al.  Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation , 2020, ArXiv.

[40]  R. Venkatesh Babu,et al.  Regularizers for Single-step Adversarial Training , 2020, ArXiv.

[41]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[42]  Liang Tong,et al.  Defending Against Physically Realizable Attacks on Image Classification , 2020, ICLR.

[43]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[44]  Jinghui Chen,et al.  RayS: A Ray Searching Method for Hard-label Adversarial Attack , 2020, KDD.

[45]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.

[46]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[47]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[48]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[49]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[50]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[51]  Changshui Zhang,et al.  Deep Defense: Training DNNs with Improved Adversarial Robustness , 2018, NeurIPS.

[52]  Ning Chen,et al.  Improving Adversarial Robustness via Promoting Ensemble Diversity , 2019, ICML.

[53]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[54]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[55]  Issa M. Khalil,et al.  Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples , 2020, CODASPY.

[56]  Alan L. Yuille,et al.  Intriguing Properties of Adversarial Training at Scale , 2020, ICLR.

[57]  Di He,et al.  Adversarially Robust Generalization Just Requires More Unlabeled Data , 2019, ArXiv.

[58]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Bernhard Schölkopf,et al.  First-Order Adversarial Vulnerability of Neural Networks and Input Dimension , 2018, ICML.

[60]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[61]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[62]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[63]  Leo Schwinn,et al.  Towards Rapid and Robust Adversarial Training with One-Step Attacks. , 2020 .

[64]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[65]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[68]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[69]  Ning Chen,et al.  Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[70]  Le Song,et al.  Deep Hyperspherical Learning , 2017, NIPS.

[71]  Leo Schwinn,et al.  Fast and Stable Adversarial Training through Noise Injection , 2020, ArXiv.

[72]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[73]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[74]  Hang Su,et al.  Benchmarking Adversarial Robustness , 2019, ArXiv.

[75]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[76]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[77]  Po-Sen Huang,et al.  Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[78]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[79]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[80]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[81]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[82]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[83]  Yao Zhao,et al.  Adversarial Attacks and Defences Competition , 2018, ArXiv.

[84]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[85]  Xiangyu Zhang,et al.  Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[86]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[87]  Zhanxing Zhu,et al.  Interpreting Adversarially Trained Convolutional Neural Networks , 2019, ICML.

[88]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[89]  Matthias Bethge,et al.  Adversarial Vision Challenge , 2018, The NeurIPS '18 Competition.