Can Perceptual Guidance Lead to Semantically Explainable Adversarial Perturbations?

It is well known that carefully crafted imperceptible perturbations can cause state-of-the-art deep learning classification models to misclassify. Understanding and analyzing these adversarial perturbations play a crucial role in the design of robust convolutional neural networks. However, their mechanics are not well understood. In this work, we attempt to understand the mechanics by systematically answering the following question: do imperceptible adversarial perturbations focus on changing the regions of the image that are important for classification? In other words, are imperceptible adversarial perturbations semantically explainable? Most current methods use lp distance to generate and characterize the imperceptibility of the adversarial perturbations. However, since lp distances only measure the pixel to pixel distances and do not consider the structure in the image, these methods do not provide a satisfactory answer to the above question. To address this issue, we propose a novel framework for generating adversarial perturbations by explicitly incorporating a “perceptual quality ball” constraint in our formulation. Specifically, we pose the adversarial example generation problem as a tractable convex optimization problem, with constraints taken from a mathematically amenable variant of the popular SSIM index. We use the MobileNetV2 network trained on the ImageNet dataset for our experiments. By comparing the SSIM maps generated by our method with class activation maps, we show that the perceptually guided perturbations introduce changes specifically in the regions that contribute to classification decisions i.e., these perturbations are semantically explainable.

[1]  Qihe Liu,et al.  Review of Artificial Intelligence Adversarial Attack and Defense Technologies , 2019, Applied Sciences.

[2]  Perceptually Constrained Adversarial Attacks , 2021, ArXiv.

[3]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[4]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[5]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[7]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[8]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations , 2018, 1807.01697.

[9]  Sumohana S. Channappayya,et al.  Quality Aware Generative Adversarial Networks , 2019, NeurIPS.

[10]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[11]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[13]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Alexandros G. Dimakis,et al.  Quantifying Perceptual Distortion of Adversarial Examples , 2019, ArXiv.

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[19]  Lujo Bauer,et al.  On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[21]  Deniz Erdogmus,et al.  Structured Adversarial Attack: Towards General Implementation and Better Interpretability , 2018, ICLR.

[22]  Terrance E. Boult,et al.  Adversarial Diversity and Hard Positive Generation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[24]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[25]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[26]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[27]  Zhou Wang,et al.  On the Mathematical Properties of the Structural Similarity Index , 2012, IEEE Transactions on Image Processing.

[28]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[29]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.