TSS: Transformation-Specific Smoothing for Robustness Certification

As machine learning (ML) systems become pervasive, safeguarding their security is critical. However, recently it has been demonstrated that motivated adversaries are able to mislead ML systems by perturbing test data using semantic transformations. While there exists a rich body of research providing provable robustness guarantees for ML models against Lp bounded adversarial perturbations, guarantees against semantic perturbations remain largely underexplored. In this paper, we provide TSS-a unified framework for certifying ML robustness against general adversarial semantic transformations. First, depending on the properties of each transformation, we divide common transformations into two categories, namely resolvable (e.g., Gaussian blur) and differentially resolvable (e.g., rotation) transformations. For the former, we propose transformation-specific randomized smoothing strategies and obtain strong robustness certification. The latter category covers transformations that involve interpolation errors, and we propose a novel approach based on stratified sampling to certify the robustness. Our framework TSS leverages these certification strategies and combines with consistency-enhanced training to provide rigorous certification of robustness. We conduct extensive experiments on over ten types of challenging semantic transformations and show that TSS significantly outperforms the state of the art. Moreover, to the best of our knowledge, TSS is the first approach that achieves nontrivial certified robustness on the large-scale ImageNet dataset. For instance, our framework achieves 30.4% certified robust accuracy against rotation attack (within ±30°) on ImageNet. Moreover, to consider a broader range of transformations, we show TSS is also robust against adaptive attacks and unforeseen image corruptions such as CIFAR-10-C and ImageNet-C.

[1]  Tom Goldstein,et al.  Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness , 2020, ICML.

[2]  Sven Gowal,et al.  Scalable Verified Training for Provably Robust Image Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Ying Tan,et al.  Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN , 2017, DMBD.

[4]  Mislav Balunovic,et al.  Certifying Geometric Robustness of Neural Networks , 2019, NeurIPS.

[5]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[6]  Jamie Hayes,et al.  Extensions and limitations of randomized smoothing for robustness guarantees , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Sijia Liu,et al.  Hidden Cost of Randomized Smoothing , 2021, AISTATS.

[8]  Chitta Baral,et al.  Attribute-Guided Adversarial Training for Robustness to Natural Perturbations , 2020, ArXiv.

[9]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[10]  Balaji Lakshminarayanan,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[11]  Timon Gehr,et al.  An abstract domain for certifying neural networks , 2019, Proc. ACM Program. Lang..

[12]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[14]  Avrim Blum,et al.  Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[15]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[16]  Cho-Jui Hsieh,et al.  Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond , 2020, NeurIPS.

[17]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[18]  Pradeep Ravikumar,et al.  MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius , 2020, ICLR.

[19]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[20]  Bo Li,et al.  SoK: Certified Robustness for Deep Neural Networks , 2020, ArXiv.

[21]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[22]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[23]  Tom Goldstein,et al.  Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates , 2020, ICLR.

[24]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[25]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[26]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[27]  Maximilian Baader,et al.  Certified Defense to Image Transformations via Randomized Smoothing , 2020, NeurIPS.

[28]  Ian Goodfellow,et al.  Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming , 2020, NeurIPS.

[29]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[31]  Pin-Yu Chen,et al.  Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[33]  Avrim Blum,et al.  Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[34]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[35]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[36]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[37]  Prateek Mittal,et al.  RobustBench: a standardized adversarial robustness benchmark , 2020, ArXiv.

[38]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[39]  Tao Xie,et al.  Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space , 2019, IJCAI.

[40]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[41]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[42]  Jinwoo Shin,et al.  Consistency Regularization for Certified Robustness of Smoothed Classifiers , 2020, NeurIPS.

[43]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[44]  Junfeng Yang,et al.  Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems , 2017, ArXiv.

[45]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[46]  Radha Poovendran,et al.  Semantic Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Pushmeet Kohli,et al.  A Framework for robustness Certification of Smoothed Classifiers using F-Divergences , 2020, ICLR.

[48]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[49]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[50]  Cho-Jui Hsieh,et al.  A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks , 2019, NeurIPS.

[51]  Liang Tong,et al.  Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features , 2017, USENIX Security Symposium.

[52]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[53]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[54]  Pramod K. Varshney,et al.  Anomalous Example Detection in Deep Learning: A Survey , 2020, IEEE Access.

[55]  Guang-He Lee,et al.  $\ell_1$ Adversarial Robustness Certificates: a Randomized Smoothing Approach , 2019 .

[56]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[57]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[58]  Qiang Liu,et al.  Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework , 2020, NeurIPS.

[59]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[60]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[61]  Ilya P. Razenshteyn,et al.  Randomized Smoothing of All Shapes and Sizes , 2020, ICML.

[62]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[63]  Xiaohui Kuang,et al.  Evading PDF Malware Classifiers with Generative Adversarial Network , 2019, CSS.