Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

We present adversarial attacks and defenses for the perceptual adversarial threat model: the set of all perturbations to natural images which can mislead a classifier but are imperceptible to human eyes. The perceptual threat model is broad and encompasses $L_2$, $L_\infty$, spatial, and many other existing adversarial threat models. However, it is difficult to determine if an arbitrary perturbation is imperceptible without humans in the loop. To solve this issue, we propose to use a {\it neural perceptual distance}, an approximation of the true perceptual distance between images using internal activations of neural networks. In particular, we use the Learned Perceptual Image Patch Similarity (LPIPS) distance. We then propose the {\it neural perceptual threat model} that includes adversarial examples with a bounded neural perceptual distance to natural images. Under the neural perceptual threat model, we develop two novel perceptual adversarial attacks to find any imperceptible perturbations to images which can fool a classifier. Through an extensive perceptual study, we show that the LPIPS distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Because the LPIPS threat model is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against 12 types of adversarial attacks and find that, for each attack, PAT achieves close to the accuracy of adversarial training against just that perturbation type. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial defense with this property.

[1]  Zhou Wang,et al.  Complex Wavelet Structural Similarity: A New Image Similarity Index , 2009, IEEE Transactions on Image Processing.

[2]  Chenxi Liu,et al.  Adversarial Attacks Beyond the Image Space , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[4]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[5]  Yi Sun,et al.  Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.

[6]  Lujo Bauer,et al.  On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[8]  T. Melham,et al.  Semantic Adversarial Perturbations using Learnt Representations , 2020, ArXiv.

[9]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[10]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[11]  Soheil Feizi,et al.  Functional Adversarial Attacks , 2019, NeurIPS.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Bernhard Plattner,et al.  Modelling the Security Ecosystem- The Dynamics of (In)Security , 2009, WEIS.

[14]  Yang Song,et al.  Constructing Unrestricted Adversarial Examples with Generative Models , 2018, NeurIPS.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[17]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[18]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[19]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[20]  Eero P. Simoncelli,et al.  Maximum differentiation (MAD) competition: a methodology for comparing computational models of perceptual quantities. , 2008, Journal of vision.

[21]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[22]  Wolfgang Heidrich,et al.  HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions , 2011, ACM Trans. Graph..

[23]  J. Zico Kolter,et al.  Adversarial Robustness Against the Union of Multiple Perturbation Models , 2019, ICML.

[24]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Alexandros G. Dimakis,et al.  Quantifying Perceptual Distortion of Adversarial Examples , 2019, ArXiv.

[28]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[29]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Xiangyu Zhang,et al.  Towards Feature Space Adversarial Attack , 2020, ArXiv.

[31]  Bo Li,et al.  Big but Imperceptible Adversarial Perturbations via Semantic Manipulation , 2019, ArXiv.

[32]  Aleksander Madry,et al.  A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations , 2017, ArXiv.

[33]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[34]  Radha Poovendran,et al.  Semantic Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[36]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[37]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[38]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[39]  Radha Poovendran,et al.  On the Limitation of Convolutional Neural Networks in Recognizing Negative Images , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Tom Goldstein,et al.  Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets , 2019, ArXiv.

[42]  Isaac Dunn,et al.  Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities , 2020 .

[43]  Nicholas Carlini,et al.  Unrestricted Adversarial Examples , 2018, ArXiv.

[44]  Alan L. Yuille,et al.  Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Inderjit S. Dhillon,et al.  The Limitations of Adversarial Training and the Blind-Spot Attack , 2019, ICLR.

[46]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.