Raising the Bar for Certified Adversarial Robustness with Diffusion Models

Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirical methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. One of our main insights is that the generalization gap, i.e., the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the robustness improvement when using additional generated data. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the $\ell_2$ ($\epsilon = 36/255$) and $\ell_\infty$ ($\epsilon = 8/255$) threat models, outperforming the previous best results by $+3.95\%$ and $+1.39\%$, respectively. Furthermore, we report similar improvements for CIFAR-100.

[1]  Min Lin,et al.  Better Diffusion Models Further Improve Adversarial Training , 2023, ArXiv.

[2]  Matt Fredrikson,et al.  Scaling in Depth: Unlocking Robustness Certification on ImageNet , 2023, ArXiv.

[3]  Dario Zanca,et al.  Exploring misclassifications of robust neural networks to enhance adversarial attacks , 2021, Applied Intelligence.

[4]  Il-Chul Moon,et al.  Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models , 2022, ArXiv.

[5]  Xiaojun Xu,et al.  LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness , 2022, NeurIPS.

[6]  Liwei Wang,et al.  Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective , 2022, NeurIPS.

[7]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[8]  Doina Precup,et al.  Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification , 2022, ICML.

[9]  Di He,et al.  Boosting the Certified Robustness of L-infinity Distance Nets , 2021, ICLR.

[10]  Pin-Yu Chen,et al.  CAT: Customized Adversarial Training for Improved Robustness , 2020, IJCAI.

[11]  Sven Gowal,et al.  Improving Robustness using Generated Data , 2021, NeurIPS.

[12]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[13]  Leon Bungert,et al.  CLIP: Cheap Lipschitz Training of Neural Networks , 2021, SSVM.

[14]  Nicolas Flammarion,et al.  RobustBench: a standardized adversarial robustness benchmark , 2020, NeurIPS Datasets and Benchmarks.

[15]  L. Li,et al.  SoK: Certified Robustness for Deep Neural Networks , 2020, 2023 IEEE Symposium on Security and Privacy (SP).

[16]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[17]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[18]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[19]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[22]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[23]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .