Scaling Adversarial Training to Large Perturbation Bounds

. The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most Adversarial Training algorithms aim at defending attacks constrained within low magnitude Lp norm bounds, real-world adversaries are not limited by such constraints. In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. The presence of images that flip Oracle predictions and those that do not makes this a challenging setting for adversarial robustness. We discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), to align the predictions of the network with that of an Oracle during adversarial training. The proposed approach achieves state-of-the-art performance at large epsilon bounds (such as an L-inf bound of 16/255 on CIFAR-10) while outperforming existing defenses (AWP, TRADES, PGD-AT) at standard bounds (8/255) as well.

[1]  B. Schiele,et al.  Relating Adversarially Robust Generalization to Flat Minima , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Timothy A. Mann,et al.  Fixing Data Augmentation to Improve Adversarial Robustness , 2021, ArXiv.

[3]  Luigi V. Mancini,et al.  Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training , 2021, ArXiv.

[4]  R. Venkatesh Babu,et al.  Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses , 2020, NeurIPS.

[5]  Nicolas Flammarion,et al.  RobustBench: a standardized adversarial robustness benchmark , 2020, NeurIPS Datasets and Benchmarks.

[6]  Timothy A. Mann,et al.  Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples , 2020, ArXiv.

[7]  Hang Su,et al.  Bag of Tricks for Adversarial Training , 2020, ICLR.

[8]  James Bailey,et al.  Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness , 2020, ArXiv.

[9]  Quanquan Gu,et al.  RayS: A Ray Searching Method for Hard-label Adversarial Attack , 2020, Knowledge Discovery and Data Mining.

[10]  Sahil Singla,et al.  Perceptual Adversarial Robustness: Defense Against Unseen Threat Models , 2020, ICLR.

[11]  Yisen Wang,et al.  Adversarial Weight Perturbation Helps Robust Generalization , 2020, NeurIPS.

[12]  Mohammad Hossein Rohban,et al.  Towards Deep Learning Models Resistant to Large Perturbations , 2020, ArXiv.

[13]  Supriyo Chakraborty,et al.  Improving Adversarial Robustness Through Progressive Hardening , 2020, ArXiv.

[14]  Stefano Ermon,et al.  Diversity can be Transferred: Output Diversification for White- and Black-box Attacks , 2020, NeurIPS.

[15]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[16]  Mohan S. Kankanhalli,et al.  Attacks Which Do Not Kill Training Make Adversarial Learning Stronger , 2020, ICML.

[17]  J. Z. Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[18]  Jorn-Henrik Jacobsen,et al.  Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations , 2020, ICML.

[19]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2019, ECCV.

[20]  Tom Goldstein,et al.  Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets , 2019, ArXiv.

[21]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[22]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[25]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[26]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[27]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[28]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[29]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[30]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[32]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[33]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[34]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[35]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[36]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[37]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[40]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[41]  Shiyu Chang,et al.  Robust Overfitting may be mitigated by properly learned smoothening , 2021, ICLR.

[42]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .