CIIDefence: Defeating Adversarial Attacks by Fusing Class-Specific Image Inpainting and Image Denoising

This paper presents a novel approach for protecting deep neural networks from adversarial attacks, i.e., methods that add well-crafted imperceptible modifications to the original inputs such that they are incorrectly classified with high confidence. The proposed defence mechanism is inspired by the recent works mitigating the adversarial disturbances by the means of image reconstruction and denoising. However, unlike the previous works, we apply the reconstruction only for small and carefully selected image areas that are most influential to the current classification outcome. The selection process is guided by the class activation map responses obtained for multiple top-ranking class labels. The same regions are also the most prominent for the adversarial perturbations and hence most important to purify. The resulting inpainting task is substantially more tractable than the full image reconstruction, while still being able to prevent the adversarial attacks. Furthermore, we combine the selective image inpainting with wavelet based image denoising to produce a non differentiable layer that prevents attacker from using gradient backpropagation. Moreover, the proposed nonlinearity cannot be easily approximated with simple differentiable alternative as demonstrated in the experiments with Backward Pass Differentiable Approximation (BPDA) attack. Finally, we experimentally show that the proposed Class-specific Image Inpainting Defence (CIIDefence) is able to withstand several powerful adversarial attacks including the BPDA. The obtained results are consistently better compared to the other recent defence approaches.

[1]  Yongdong Zhang,et al.  AE-GAN: adversarial eliminating with GAN , 2017, ArXiv.

[2]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[3]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nicholas Carlini,et al.  On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses , 2018, ArXiv.

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[7]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[10]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[11]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[12]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  David A. Wagner,et al.  MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples , 2017, ArXiv.

[15]  James A. Storer,et al.  Deflecting Adversarial Attacks with Pixel Deflection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Atul Prakash,et al.  Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[17]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[20]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[23]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[24]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[25]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[26]  Qi Zhao,et al.  Foveation-based Mechanisms Alleviate Adversarial Examples , 2015, ArXiv.

[27]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[28]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[30]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[32]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Li Chen,et al.  Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression , 2017, ArXiv.