Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Making classifiers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance (cid:15) (in some met-ric), we show how to build a similarly robust (but computationally inefficient) classifier for attacks at distance (cid:15)/ 2 . Our reduction is computationally inefficient , but preserves the sample complexity of the original detector. The reduction thus cannot be directly used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efficient classifier ). To illustrate, we revisit 14 empirical detector defenses published over the past years. For 12 / 14 defenses, we show that the claimed detection results imply an inefficient classifier with robustness far beyond the state-of-the-art.

[1]  Timothy A. Mann,et al.  Fixing Data Augmentation to Improve Adversarial Robustness , 2021, ArXiv.

[2]  Nicolas Flammarion,et al.  RobustBench: a standardized adversarial robustness benchmark , 2020, NeurIPS Datasets and Benchmarks.

[3]  S. Jha,et al.  A General Framework For Detecting Anomalous Inputs to DNN Classifiers , 2020, ICML.

[4]  Prateek Mittal,et al.  PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking , 2020, USENIX Security Symposium.

[5]  J. Zico Kolter,et al.  Provably robust classification of adversarial examples with detection , 2021, ICLR.

[6]  Jun Zhu,et al.  Adversarial Training with Rectified Rejection , 2021, ArXiv.

[7]  Zhanyuan Zhang,et al.  Clipped BagNet: Defending Against Sticker Attacks with Clipped Bag-of-features , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[8]  Tom Goldstein,et al.  Certified Defenses for Adversarial Patches , 2020, ICLR.

[9]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[10]  Florian Tramèr,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[11]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2019, ECCV.

[12]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[13]  Somesh Jha,et al.  Adversarially Robust Learning Could Leverage Computational Hardness , 2020, ALT.

[14]  G. Rohde,et al.  GAT: Generative Adversarial Training for Adversarial Example Detection and Robust Classification , 2019, International Conference on Learning Representations.

[15]  Tao Yu,et al.  A New Defense Against Adversarial Images: Turning a Weakness into a Strength , 2019, NeurIPS.

[16]  Di He,et al.  Adversarially Robust Generalization Just Requires More Unlabeled Data , 2019, ArXiv.

[17]  Po-Sen Huang,et al.  Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[18]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[19]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[20]  Ananthram Swami,et al.  Attribution-driven Causal Analysis for Detection of Adversarial Examples , 2019, ArXiv.

[21]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[22]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[23]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[24]  Salman Khan,et al.  Local Gradients Smoothing: Defense Against Localized Adversarial Attacks , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[25]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[26]  Yingzhen Li,et al.  Are Generative Classifiers More Robust to Adversarial Attacks? , 2018, ICML.

[27]  George Kesidis,et al.  When Not to Classify: Anomaly Detection of Attacks (ADA) on DNN Classifiers at Test Time , 2017, Neural Computation.

[28]  Wen-Chuan Lee,et al.  NIC: Detecting Adversarial Samples with Neural Network Invariant Checking , 2019, NDSS.

[29]  Dan Boneh,et al.  SentiNet: Detecting Physical Attacks Against Deep Learning Systems , 2018, ArXiv.

[30]  Yin Tat Lee,et al.  Adversarial Examples from Cryptographic Pseudo-Random Generators , 2018, ArXiv.

[31]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[32]  Jamie Hayes,et al.  On Visible Adversarial Perturbations & Digital Watermarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[34]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[35]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[36]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[37]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[38]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[39]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[40]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[41]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[42]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[43]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[44]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[45]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[46]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.