论文信息 - Denoised Smoothing: A Provable Defense for Pretrained Classifiers

Denoised Smoothing: A Provable Defense for Pretrained Classifiers

We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at: this https URL.

Mingjie Sun | Ashish Kapoor | Greg Yang | J. Zico Kolter | Hadi Salman

[1] Greg Yang,et al. Improved Image Wasserstein Attacks and Defenses , 2020, ArXiv.

[2] Pushmeet Kohli,et al. Training verified learners with learned verifiers , 2018, ArXiv.

[3] J. Zico Kolter,et al. Scaling provable adversarial defenses , 2018, NeurIPS.

[4] Nicholas Carlini,et al. On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses , 2018, ArXiv.

[5] Pushmeet Kohli,et al. A Dual Approach to Scalable Verification of Deep Networks , 2018, UAI.

[6] Ilya P. Razenshteyn,et al. Randomized Smoothing of All Shapes and Sizes , 2020, ICML.

[7] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[9] Dan Boneh,et al. The Space of Transferable Adversarial Examples , 2017, ArXiv.

[10] Suman Jana,et al. Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[11] Luca Rigazio,et al. Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[12] Yizheng Chen,et al. MixTrain: Scalable Training of Formally Robust Neural Networks , 2018, ArXiv.

[13] J. Zico Kolter,et al. Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[14] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[15] Junfeng Yang,et al. Efficient Formal Safety Analysis of Neural Networks , 2018, NeurIPS.

[16] Yanjun Qi,et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[17] Cho-Jui Hsieh,et al. A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks , 2019, NeurIPS.

[18] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[19] Lei Zhang,et al. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[20] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21] Matthias Hein,et al. Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[22] Andrew Gordon Wilson,et al. Simple Black-box Adversarial Attacks , 2019, ICML.

[23] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Xiaolin Hu,et al. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Jian Yang,et al. MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Dawn Xiaodong Song,et al. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[27] Soheil Feizi,et al. Functional Adversarial Attacks , 2019, NeurIPS.

[28] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[29] Tommi S. Jaakkola,et al. Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers , 2019, NeurIPS.

[30] Swarat Chaudhuri,et al. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[31] Aditi Raghunathan,et al. Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[32] Pradeep Ravikumar,et al. MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius , 2020, ICLR.

[33] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[35] Inderjit S. Dhillon,et al. Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[36] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[37] Lei Zhang,et al. FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[38] Matthew Mirman,et al. Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[39] Lawrence Carin,et al. Second-Order Adversarial Attack and Certifiable Robustness , 2018, ArXiv.

[40] Cho-Jui Hsieh,et al. Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[41] Cho-Jui Hsieh,et al. Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[42] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[43] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[44] James A. Storer,et al. Deflecting Adversarial Attacks with Pixel Deflection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[46] Esa Rahtu,et al. CIIDefence: Defeating Adversarial Attacks by Fusing Class-Specific Image Inpainting and Image Denoising , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47] Pushmeet Kohli,et al. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[48] Timothy A. Mann,et al. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[49] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[50] Pushmeet Kohli,et al. A Framework for robustness Certification of Smoothed Classifiers using F-Divergences , 2020, ICLR.

[51] Greg Yang,et al. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[52] Matthew Mirman,et al. Fast and Effective Robustness Certification , 2018, NeurIPS.

[53] Xiaoyu Cao,et al. Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification , 2017, ACSAC.

[54] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.