PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking

Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified, and defending against such attacks is an unsolved/open problem. In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches. The cornerstone of PatchGuard involves the use of CNNs with small receptive fields to impose a bound on the number of features corrupted by an adversarial patch. Given a bounded number of corrupted features, the problem of designing an adversarial patch defense reduces to that of designing a secure feature aggregation mechanism. Towards this end, we present our robust masking defense that robustly detects and masks corrupted features to recover the correct prediction. Notably, we can prove the robustness of our defense against any adversary within our threat model. Our extensive evaluation on ImageNet, ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates that our defense achieves state-of-the-art performance in terms of both provable robust accuracy and clean accuracy.

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[7]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[14]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[15]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[16]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[17]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[18]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[19]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[20]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[21]  Ali Borji,et al.  What are the Receptive, Effective Receptive, and Projective Fields of Neurons in Convolutional Neural Networks? , 2017, ArXiv.

[22]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[23]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[24]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[25]  Prateek Mittal,et al.  Rogue Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logos , 2018, ArXiv.

[26]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[27]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[28]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[29]  Prateek Mittal,et al.  PAC-learning in the presence of adversaries , 2018, NeurIPS.

[30]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[31]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[32]  Prateek Mittal,et al.  Not All Pixels are Born Equal: An Analysis of Evasion Attacks under Locality Constraints , 2018, CCS.

[33]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[34]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[35]  Jamie Hayes,et al.  On Visible Adversarial Perturbations & Digital Watermarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[37]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Yoav Goldberg,et al.  LaVAN: Localized and Visible Adversarial Noise , 2018, ICML.

[39]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Kannan Ramchandran,et al.  Rademacher Complexity for Adversarially Robust Generalization , 2018, ICML.

[41]  Andre Araujo,et al.  Computing Receptive Fields of Convolutional Neural Networks , 2019, Distill.

[42]  Elvis Dohmatob,et al.  Generalized No Free Lunch Theorem for Adversarial Robustness , 2018, ICML.

[43]  Xin Liu,et al.  DPATCH: An Adversarial Patch Attack on Object Detectors , 2018, SafeAI@AAAI.

[44]  Daniel Cullina,et al.  Lower Bounds on Adversarial Robustness from Optimal Transport , 2019, NeurIPS.

[45]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[46]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[47]  Toon Goedemé,et al.  Fooling Automated Surveillance Cameras: Adversarial Patches to Attack Person Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[49]  Salman Khan,et al.  Local Gradients Smoothing: Defense Against Localized Adversarial Attacks , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[50]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[51]  Sven Gowal,et al.  Scalable Verified Training for Provably Robust Image Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[53]  Chong Xiang,et al.  PatchGuard: Provable Defense against Adversarial Patches Using Masks on Small Receptive Fields , 2020, ArXiv.

[54]  Y. Vorobeychik,et al.  Defending Against Physically Realizable Attacks on Image Classification , 2019, ICLR.

[55]  Tom Goldstein,et al.  Certified Defenses for Adversarial Patches , 2020, ICLR.

[56]  Florian Tramèr,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[57]  Bernt Schiele,et al.  Adversarial Training against Location-Optimized Adversarial Patches , 2020, ECCV Workshops.

[58]  Michael McCoyd,et al.  Minority Reports Defense: Defending Against Adversarial Patches , 2020, ACNS Workshops.

[59]  Alexander Levine,et al.  (De)Randomized Smoothing for Certifiable Defense against Patch Attacks , 2020, NeurIPS.

[60]  Cihang Xie,et al.  PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning , 2020, ECCV.

[61]  Zhanyuan Zhang,et al.  Clipped BagNet: Defending Against Sticker Attacks with Clipped Bag-of-features , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[62]  J. H. Metzen,et al.  Efficient Certified Defenses Against Patch Attacks on Image Classifiers , 2021, ICLR.