(De)Randomized Smoothing for Certifiable Defense against Patch Attacks

Patch adversarial attacks on images, in which the attacker can distort pixels within a region of bounded size, are an important threat model since they provide a quantitative model for physical adversarial attacks. In this paper, we introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size, no patch adversarial examples exist. Our method is related to the broad class of randomized smoothing robustness schemes which provide high-confidence probabilistic robustness certificates. By exploiting the fact that patch attacks are more constrained than general sparse attacks, we derive meaningfully large robustness certificates. Additionally, the algorithm we propose is de-randomized, providing deterministic certificates. To the best of our knowledge, there exists only one prior method for certifiable defense against patch attacks, which relies on interval bound propagation. While this sole existing method performs well on MNIST, it has several limitations: it requires computationally expensive training, does not scale to ImageNet, and performs poorly on CIFAR-10. In contrast, our proposed method effectively addresses all of these issues: our classifier can be trained quickly, achieves high clean and certified robust accuracy on CIFAR-10, and provides certificates at the ImageNet scale. For example, for a 5*5 patch attack on CIFAR-10, our method achieves up to around 57.8% certified accuracy (with a classifier around 83.9% clean accuracy), compared to at most 30.3% certified accuracy for the existing method (with a classifier with around 47.8% clean accuracy), effectively establishing a new state-of-the-art. Code is available at this https URL.

[1]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[2]  Avrim Blum,et al.  Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[3]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[4]  Salman Khan,et al.  Local Gradients Smoothing: Defense Against Localized Adversarial Attacks , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Jamie Hayes,et al.  On Visible Adversarial Perturbations & Digital Watermarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Alexander Levine,et al.  Certifiably Robust Interpretation in Deep Learning , 2019, ArXiv.

[7]  Yoav Goldberg,et al.  LaVAN: Localized and Visible Adversarial Noise , 2018, ICML.

[8]  Alexander Levine,et al.  Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation , 2019, AAAI.

[9]  Tom Goldstein,et al.  Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness , 2020, ICML.

[10]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Tom Goldstein,et al.  Certified Defenses for Adversarial Patches , 2020, ICLR.

[12]  Pradeep Ravikumar,et al.  Certified Robustness to Label-Flipping Attacks via Randomized Smoothing , 2020, ICML.

[13]  Avrim Blum,et al.  Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[14]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Greg Yang,et al.  Randomized Smoothing of All Shapes and Sizes , 2020, ICML.

[16]  Chunpeng Wu,et al.  Regularized Training and Tight Certification for Randomized Smoothed Classifier with Provable Robustness , 2020, AAAI.

[17]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[18]  Tommi S. Jaakkola,et al.  Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers , 2019, NeurIPS.

[19]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[20]  Soheil Feizi,et al.  Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks , 2019, AISTATS.

[21]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[22]  Guang-He Lee,et al.  $\ell_1$ Adversarial Robustness Certificates: a Randomized Smoothing Approach , 2019 .

[23]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[24]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.