论文信息 - RobustBench: a standardized adversarial robustness benchmark

RobustBench: a standardized adversarial robustness benchmark

Evaluation of adversarial robustness is often error-prone leading to overestimation of the true robustness of models. While adaptive attacks designed for a particular defense are a way out of this, there are only approximate guidelines on how to perform them. Moreover, adaptive evaluations are highly customized for particular models, which makes it difficult to compare different defenses. Our goal is to establish a standardized benchmark of adversarial robustness, which as accurately as possible reflects the robustness of the considered models within a reasonable computational budget. This requires to impose some restrictions on the admitted models to rule out defenses that only make gradient-based attacks ineffective without improving actual robustness. We evaluate robustness of models for our benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications. Our leaderboard, hosted at this http URL, aims at reflecting the current state of the art on a set of well-defined tasks in $\ell_\infty$- and $\ell_2$-threat models with possible extensions in the future. Additionally, we open-source the library this http URL that provides unified access to state-of-the-art robust models to facilitate their downstream applications. Finally, based on the collected models, we analyze general trends in $\ell_p$-robustness and its impact on other tasks such as robustness to various distribution shifts and out-of-distribution detection.

[1] John Duchi,et al. Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[2] Quanquan Gu,et al. Efficient Robust Training via Backward Smoothing , 2020, AAAI.

[3] Liwei Song,et al. A Critical Evaluation of Open-World Machine Learning , 2020, ArXiv.

[4] Nicholas Carlini,et al. Unrestricted Adversarial Examples , 2018, ArXiv.

[5] Andrew Gordon Wilson,et al. Simple Black-box Adversarial Attacks , 2019, ICML.

[6] Ruitong Huang,et al. Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training , 2018, ICLR.

[7] Yaron Lipman,et al. Controlling Neural Level Sets , 2019, NeurIPS.

[8] Cong Xu,et al. An Orthogonal Classifier for Improving the Adversarial Robustness of Neural Networks , 2022, Information Sciences.

[9] Hang Su,et al. Benchmarking Adversarial Robustness on Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yi Sun,et al. Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.

[11] Seyed-Mohsen Moosavi-Dezfooli,et al. Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off , 2022, ICLR.

[12] Martín Abadi,et al. Adversarial Patch , 2017, ArXiv.

[13] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[14] Yang Wang,et al. Advbox: a toolbox to generate adversarial examples that fool neural networks , 2020, ArXiv.

[15] Matthias Hein,et al. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[16] Ben Y. Zhao,et al. Fawkes: Protecting Privacy against Unauthorized Deep Learning Models , 2020, USENIX Security Symposium.

[17] Luyu Wang,et al. advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch , 2019, ArXiv.

[18] Alan L. Yuille,et al. Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19] Po-Sen Huang,et al. An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[20] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21] Aleksander Madry,et al. Exploring the Landscape of Spatial Robustness , 2017, ICML.

[22] Jinfeng Yi,et al. Provable Defense Against Delusive Poisoning , 2021, ArXiv.

[23] Yi Sun,et al. Transfer of Adversarial Robustness Between Perturbation Types , 2019, ArXiv.

[24] Neil Genzlinger. A. and Q , 2006 .

[25] Peter A. Beerel,et al. A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs , 2020, ArXiv.

[26] Nicholas Carlini. A critique of the DeepSec Platform for Security Analysis of Deep Learning Models , 2019, ArXiv.

[27] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[28] Vineeth N. Balasubramanian,et al. Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models , 2019, IJCAI.

[29] J. Zico Kolter,et al. Adversarial Robustness Against the Union of Multiple Perturbation Models , 2019, ICML.

[30] Charles Jin,et al. Manifold Regularization for Adversarial Robustness , 2020, ArXiv.

[31] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[32] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[33] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[34] Carmela Troncoso,et al. POTs: protective optimization technologies , 2018, FAT*.

[35] Atul Prakash,et al. Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[36] Xiao Wang,et al. Sensible adversarial learning , 2019 .

[37] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[38] Bin Dong,et al. You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[39] Juan C. P'erez,et al. ClusTR: Clustering Training for Robustness , 2020, ArXiv.

[40] Simran Kaur,et al. Are Perceptually-Aligned Gradients a General Property of Robust Classifiers? , 2019, ArXiv.

[41] Liwei Song,et al. Systematic Evaluation of Privacy Risks of Machine Learning Models , 2020, USENIX Security Symposium.

[42] Matthias Hein,et al. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[43] Jinghui Chen,et al. RayS: A Ray Searching Method for Hard-label Adversarial Attack , 2020, KDD.

[44] Edgar Dobriban,et al. Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition , 2021, ArXiv.

[45] Quanquan Gu,et al. Do Wider Neural Networks Really Help Adversarial Robustness? , 2020, NeurIPS.

[46] James Bailey,et al. Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[47] Ismail Ben Ayed,et al. Augmented Lagrangian Adversarial Attacks , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[48] Mohsen Imani,et al. Mockingbird: Defending Against Deep-Learning-Based Website Fingerprinting Attacks With Adversarial Traces , 2019, IEEE Transactions on Information Forensics and Security.

[49] Trevor Darrell,et al. Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks , 2021, ArXiv.

[50] Matthias Bethge,et al. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[51] Dan Boneh,et al. AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning , 2019, CCS.

[52] Prateek Mittal,et al. Improving Adversarial Robustness Using Proxy Distributions , 2021, ArXiv.

[53] Timothy A. Mann,et al. Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples , 2020, ArXiv.

[54] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[55] Ling Shao,et al. Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56] Sven Gowal,et al. Scalable Verified Training for Provably Robust Image Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[58] Nicolas Flammarion,et al. On the effectiveness of adversarial training against common corruptions , 2021, UAI.

[59] Nicolas Flammarion,et al. Understanding and Improving Fast Adversarial Training , 2020, NeurIPS.

[60] Oleg Sokolsky,et al. Robust Learning via Persistency of Excitation , 2021, ArXiv.

[61] R. Sarpong,et al. Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[62] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[63] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[64] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[65] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[66] Hongyang R. Zhang,et al. Self-Adaptive Training: beyond Empirical Risk Minimization , 2020, NeurIPS.

[67] Yingzhen Li,et al. Are Generative Classifiers More Robust to Adversarial Attacks? , 2018, ICML.

[68] J. Zico Kolter,et al. Learning perturbation sets for robust machine learning , 2020, ICLR.

[69] Logan Engstrom,et al. Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.

[70] Seunghoon Hong,et al. Adversarial Defense via Learning to Generate Diverse Attacks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71] Jun Zhu,et al. Adversarial Training with Rectified Rejection , 2021, ArXiv.

[72] Yisen Wang,et al. Adversarial Weight Perturbation Helps Robust Generalization , 2020, NeurIPS.

[73] Gang Niu,et al. Geometry-aware Instance-reweighted Adversarial Training , 2021, ICLR.

[74] Jie Fu,et al. Jacobian Adversarially Regularized Networks for Robustness , 2020, ICLR.

[75] Jianyu Wang,et al. Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76] Nicolas Flammarion,et al. Square Attack: a query-efficient black-box adversarial attack via random search , 2020, ECCV.

[77] Bernt Schiele,et al. Relating Adversarially Robust Generalization to Flat Minima , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[78] Ryan P. Adams,et al. Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[79] Po-Sen Huang,et al. Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[80] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81] Supriyo Chakraborty,et al. Improving Adversarial Robustness Through Progressive Hardening , 2020, ArXiv.

[82] Yu Cheng,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2020, ICLR.

[83] Ning Chen,et al. Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[84] Matthias Hein,et al. Sparse and Imperceivable Adversarial Attacks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[85] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[86] Matthias Bethge,et al. Adversarial Vision Challenge , 2018, The NeurIPS '18 Competition.

[87] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[88] Gal Mishne,et al. Online Adversarial Purification based on Self-supervised Learning , 2021, ICLR.

[89] Russ Tedrake,et al. Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[90] Haichao Zhang,et al. Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[91] Wei Xu,et al. Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness , 2019 .

[92] W. Brendel,et al. Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[93] Suman Jana,et al. On Pruning Adversarially Robust Neural Networks , 2020, ArXiv.

[94] David J. Fleet,et al. Bridging the Gap Between Adversarial Robustness and Optimization Bias , 2021, ArXiv.

[95] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[96] Greg Yang,et al. Improved Image Wasserstein Attacks and Defenses , 2020, ArXiv.

[97] Dan Boneh,et al. Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[98] Jure Leskovec,et al. WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2021, ICML.

[99] Cho-Jui Hsieh,et al. Improving the Speed and Quality of GAN by Adversarial Training , 2020, ArXiv.

[100] Dina Katabi,et al. ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation , 2019, ICML.

[101] R. Stephenson. A and V , 1962, The British journal of ophthalmology.

[102] Aleksander Madry,et al. Adversarial Robustness as a Prior for Learned Representations , 2019 .

[103] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[104] Xiao Zhang,et al. Incorporating Label Uncertainty in Understanding Adversarial Robustness , 2021, ArXiv.

[105] B. Eskofier,et al. Exploring Robust Misclassiﬁcations of Neural Networks to Enhance Adversarial Attacks , 2021 .

[106] Amos J. Storkey,et al. School of Informatics, University of Edinburgh , 2022 .

[107] Mohan S. Kankanhalli,et al. Attacks Which Do Not Kill Training Make Adversarial Learning Stronger , 2020, ICML.

[108] J. Zico Kolter,et al. Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[109] Samaneh Azadi,et al. Generative Models as a Robust Alternative for Image Classification: Progress and Challenges , 2021 .

[110] Seyed-Mohsen Moosavi-Dezfooli,et al. SparseFool: A Few Pixels Make a Big Difference , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[111] Jiliang Tang,et al. DeepRobust: A PyTorch Library for Adversarial Attacks and Defenses , 2020, ArXiv.

[112] Pushmeet Kohli,et al. Adversarial Robustness through Local Linearization , 2019, NeurIPS.

[113] Baishakhi Ray,et al. Metric Learning for Adversarial Robustness , 2019, NeurIPS.

[114] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[115] Cho-Jui Hsieh,et al. Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[116] Inderjit S. Dhillon,et al. Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[117] Bhavya Kailkhura,et al. TSS: Transformation-Specific Smoothing for Robustness Certification , 2020, CCS.

[118] Matthias Hein,et al. Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[119] Ian J. Goodfellow,et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[120] Nic Ford,et al. Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[121] Suman Jana,et al. HYDRA: Pruning Adversarially Robust Neural Networks , 2020, NeurIPS.

[122] Pascal Frossard,et al. Manitest: Are classifiers really invariant? , 2015, BMVC.

[123] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[124] Hang Su,et al. Bag of Tricks for Adversarial Training , 2020, ICLR.

[125] Michael W. Mahoney,et al. Adversarially-Trained Deep Nets Transfer Better , 2020, ArXiv.

[126] Hong-Yuan Mark Liao,et al. YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[127] Yao Zhao,et al. Adversarial Attacks and Defences Competition , 2018, ArXiv.

[128] Kun Xu,et al. Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks , 2020, ICLR.

[129] Matthias Hein,et al. Logit Pairing Methods Can Fool Gradient-Based Attacks , 2018, ArXiv.

[130] Timothy A. Mann,et al. Fixing Data Augmentation to Improve Adversarial Robustness , 2021, ArXiv.

[131] Seyed-Mohsen Moosavi-Dezfooli,et al. Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[132] To all authors , 1995 .

[133] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[134] Martin Wistuba,et al. Adversarial Robustness Toolbox v1.0.0 , 2018, 1807.01069.

[135] Luiz Eduardo Soares de Oliveira,et al. Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[136] Tom Goldstein,et al. Adversarial attacks on Copyright Detection Systems , 2019, ICML.

[137] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[138] J. Zico Kolter,et al. Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[139] Timothy A. Mann,et al. Defending Against Image Corruptions Through Adversarial Augmentations , 2021, ICLR.

[140] I. Kweon,et al. Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy , 2020, Preregister@NeurIPS.

[141] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[142] Benjamin Recht,et al. Measuring Robustness to Natural Distribution Shifts in Image Classification , 2020, NeurIPS.

[143] Giorgos Tolias,et al. Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[144] Jiliang Tang,et al. To be Robust or to be Fair: Towards Fairness in Adversarial Training , 2020, ICML.

[145] Hang Su,et al. Boosting Adversarial Training with Hypersphere Embedding , 2020, NeurIPS.

[146] Balaji Lakshminarayanan,et al. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[147] Patrick D. McDaniel,et al. Adversarial Examples for Malware Detection , 2017, ESORICS.

[148] J. Zico Kolter,et al. Overfitting in adversarially robust deep learning , 2020, ICML.

[149] Quoc V. Le,et al. Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[150] Cyrus Rashtchian,et al. A Closer Look at Accuracy vs. Robustness , 2020, NeurIPS.

[151] Peilin Zhong,et al. Enhancing Adversarial Defense by k-Winners-Take-All , 2020, ICLR.

[152] Naman D. Singh,et al. Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks , 2020, ArXiv.

[153] Matthias Hein,et al. Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[154] Colin Raffel,et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[155] Erwan Le Merrer,et al. RoBIC: A benchmark suite for assessing classifiers robustness , 2021, ICIP 2021.

[156] Mung Chiang,et al. Analyzing the Robustness of Open-World Machine Learning , 2019, AISec@CCS.

[157] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[158] Aleksander Madry,et al. On Evaluating Adversarial Robustness , 2019, ArXiv.

[159] Alexandros G. Dimakis,et al. Quantifying Perceptual Distortion of Adversarial Examples , 2019, ArXiv.

[160] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[161] Suman Jana,et al. Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[162] Ashish Kapoor,et al. Do Adversarially Robust ImageNet Models Transfer Better? , 2020, NeurIPS.

[163] Kimin Lee,et al. Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[164] Soheil Feizi,et al. Functional Adversarial Attacks , 2019, NeurIPS.

[165] Jiaya Jia,et al. Learnable Boundary Guided Adversarial Training , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[166] Ludwig Schmidt,et al. Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[167] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[168] Sahil Singla,et al. Perceptual Adversarial Robustness: Defense Against Unseen Threat Models , 2020, ICLR.

[169] Yu Cheng,et al. Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[170] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[171] Ting Wang,et al. DEEPSEC: A Uniform Platform for Security Analysis of Deep Learning Model , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[172] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[173] Úlfar Erlingsson,et al. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[174] Aleksander Madry,et al. On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[175] Prateek Mittal,et al. Privacy Risks of Securing Machine Learning Models against Adversarial Examples , 2019, CCS.

[176] Matthias Hein,et al. Adversarial Robustness on In- and Out-Distribution Improves Explainability , 2020, ECCV.

[177] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .