Internal Wasserstein Distance for Adversarial Attack and Defense

Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks that would trigger misclassification of DNNs but may be imperceptible to human perception. Adversarial defense has been an important way to improve the robustness of DNNs. Existing attack methods often construct adversarial examples relying on some metrics like the $\ell_p$ distance to perturb samples. However, these metrics can be insufficient to conduct adversarial attacks due to their limited perturbations. In this paper, we propose a new internal Wasserstein distance (IWD) to capture the semantic similarity of two samples, and thus it helps to obtain larger perturbations than currently used metrics such as the $\ell_p$ distance. We then apply the internal Wasserstein distance to perform adversarial attack and defense. In particular, we develop a novel attack method relying on IWD to calculate the similarities between an image and its adversarial examples. In this way, we can generate diverse and semantically similar adversarial examples that are more difficult to defend by existing defense methods. Moreover, we devise a new defense method relying on IWD to learn robust models against unseen adversarial examples. We provide both thorough theoretical and empirical evidence to support our methods.

[1]  Alejandro Ribeiro,et al.  Adversarial Robustness with Semi-Infinite Constrained Learning , 2021, NeurIPS.

[2]  Matthias Hein,et al.  Mind the box: l1-APGD for sparse adversarial attacks on image classifiers , 2021, ICML.

[3]  Fabio Roli,et al.  Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints , 2021, NeurIPS.

[4]  Adel Javanmard,et al.  Fundamental Tradeoffs in Distributionally Adversarial Training , 2021, ICML.

[5]  R. Vidal,et al.  A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses , 2020, Neural Information Processing Systems.

[6]  Yaoliang Yu,et al.  Stronger and Faster Wasserstein Adversarial Attacks , 2020, ICML.

[7]  Jinwoo Shin,et al.  Learning to Generate Noise for Multi-Attack Robustness , 2020, ICML.

[8]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[9]  Guosheng Lin,et al.  DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Cyrus Rashtchian,et al.  A Closer Look at Accuracy vs. Robustness , 2020, NeurIPS.

[11]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[12]  J. Z. Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[13]  Mohan S. Kankanhalli,et al.  Attacks Which Do Not Kill Training Make Adversarial Learning Stronger , 2020, ICML.

[14]  John Duchi,et al.  Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[15]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[16]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[17]  Ser-Nam Lim,et al.  Fine-grained Synthesis of Unrestricted Adversarial Examples , 2019, ArXiv.

[18]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[19]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[20]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[22]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[23]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[24]  Chun-Liang Li,et al.  Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer , 2018, ICLR.

[25]  Deniz Erdogmus,et al.  Structured Adversarial Attack: Towards General Implementation and Better Interpretability , 2018, ICLR.

[26]  Le Song,et al.  Adversarial Attack on Graph Structured Data , 2018, ICML.

[27]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[28]  Yang Song,et al.  Constructing Unrestricted Adversarial Examples with Generative Models , 2018, NeurIPS.

[29]  Giovanni S. Alberti,et al.  ADef: an Iterative Algorithm to Construct Adversarial Deformations , 2018, ICLR.

[30]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[31]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[32]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[33]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[34]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[35]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[36]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[37]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[38]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[39]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[40]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[41]  Csaba Szepesvári,et al.  Multiclass Classification Calibration Functions , 2016, ArXiv.

[42]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[43]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[49]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[50]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[51]  Zhi-Hua Zhou,et al.  On the Consistency of Multi-Label Learning , 2011, COLT.

[52]  C. Villani Optimal Transport: Old and New , 2008 .

[53]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[54]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[55]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[56]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[57]  R. Venkatesh Babu,et al.  Towards Efficient and Effective Adversarial Training , 2021, NeurIPS.

[58]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[59]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[60]  H. Alt Lineare Funktionalanalysis : eine anwendungsorientierte Einführung , 2002 .