Luring of Adversarial Perturbations

The growing interest for adversarial examples, i.e. maliciously modified examples which fool a classifier, has resulted in many defenses intended to detect them, render them inoffensive or make the model more robust against them. In this paper, we pave the way towards a new approach to defend a distant system against adversarial examples, which we name the luring of adversarial perturbations. A component is included in the target model to form an augmented and equally accurate version of it. This additional component is designed to be removable and to give false indications on the way to fool the target model alone: the adversary is tricked into fooling the augmented version of the target model, and not the target model. We explain the intuition of our defense with the principle of the luring effect, inspired by the notion of robust and non-robust features, and experimentally justify its validity. Eventually, we propose a simple prediction strategy which takes advantage of this effect, and show that our defense scheme on MNIST, SVHN and CIFAR10 can efficiently thwart an adversary using state-of-the-art attacks and allowed to perform large perturbations.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[3]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[4]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[5]  Chinmay Hegde,et al.  Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[7]  Lewis D. Griffin,et al.  A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples , 2016, ArXiv.

[8]  Andrew Gordon Wilson,et al.  Simple Black-box Adversarial Attacks , 2019, ICML.

[9]  Matthias Hein,et al.  Sparse and Imperceivable Adversarial Attacks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[11]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yizheng Chen,et al.  Enhancing Gradient-based Attacks with Symbolic Intervals , 2019, ArXiv.

[13]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[14]  Alan L. Yuille,et al.  Improving Transferability of Adversarial Examples With Input Diversity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[16]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[17]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[18]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[19]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[22]  Cho-Jui Hsieh,et al.  Sign-OPT: A Query-Efficient Hard-label Adversarial Attack , 2020, ICLR.

[23]  Martin Wattenberg,et al.  Adversarial Spheres , 2018, ICLR.

[24]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[25]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[26]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[27]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[31]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[32]  Luiz Eduardo Soares de Oliveira,et al.  Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[35]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[36]  Simran Kaur,et al.  Are Perceptually-Aligned Gradients a General Property of Robust Classifiers? , 2019, ArXiv.

[37]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[38]  Michael I. Jordan,et al.  HopSkipJumpAttack: A Query-Efficient Decision-Based Attack , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[39]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[40]  Aleksander Madry,et al.  A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations , 2017, ArXiv.

[41]  Haichao Zhang,et al.  Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[42]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[43]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Jun Zhu,et al.  Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.