Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation

Recently, techniques have been developed to provably guarantee the robustness of a classifier to adversarial perturbations of bounded L_1 and L_2 magnitudes by using randomized smoothing: the robust classification is a consensus of base classifications on randomly noised samples where the noise is additive. In this paper, we extend this technique to the L_0 threat model. We propose an efficient and certifiably robust defense against sparse adversarial attacks by randomly ablating input features, rather than using additive noise. Experimentally, on MNIST, we can certify the classifications of over 50% of images to be robust to any distortion of at most 8 pixels. This is comparable to the observed empirical robustness of unprotected classifiers on MNIST to modern L_0 attacks, demonstrating the tightness of the proposed robustness certificate. We also evaluate our certificate on ImageNet and CIFAR-10. Our certificates represent an improvement on those provided in a concurrent work (Lee et al. 2019) which uses random noise rather than ablation (median certificates of 8 pixels versus 4 pixels on MNIST; 16 pixels versus 1 pixel on ImageNet.) Additionally, we empirically demonstrate that our classifier is highly robust to modern sparse adversarial attacks on MNIST. Our classifications are robust, in median, to adversarial perturbations of up to 31 pixels, compared to 22 pixels reported as the state-of-the-art defense, at the cost of a slight decrease (around 2.3%) in the classification accuracy. Code is available at this https URL.

[1]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[2]  Mitali Bafna,et al.  Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier Transform , 2018, NeurIPS.

[3]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[4]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[5]  Tommi S. Jaakkola,et al.  Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers , 2019, NeurIPS.

[6]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[7]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[8]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[9]  Tommi S. Jaakkola,et al.  A Stratified Approach to Robustness for Randomly Smoothed Classifiers , 2019, NeurIPS 2019.

[10]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Lawrence Carin,et al.  Second-Order Adversarial Attack and Certifiable Robustness , 2018, ArXiv.

[12]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[13]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[14]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[15]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[16]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[17]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[18]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[19]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[22]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[23]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[24]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[25]  Radha Poovendran,et al.  Dropping Pixels for Adversarial Robustness , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[27]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[28]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.