Generating transferable adversarial examples based on perceptually-aligned perturbation

Neural networks (NNs) are known to be susceptible to adversarial examples (AEs), which are intentionally designed to deceive a target classifier by adding small perturbations to the inputs. And interestingly, AEs crafted for one NN can mislead another model. Such a property is referred to as transferability, which is often leveraged to perform attacks in black-box settings. To mitigate the transferability of AEs, many approaches are explored to enhance the NN’s robustness. Especially, adversarial training (AT) and its variants are shown be the strongest defense to resist such transferable AEs. To boost the transferability of AEs against the robust models that have undergone AT, a novel AE generating method is proposed in this paper. The motivation of our method is based on the observation that robust models with AT is more sensitive to the perceptually-relevant gradients, hence it is reasonable to synthesize the AEs by the perturbations that have the perceptually-aligned features. The detailed process of the proposed method is given as below. First, by optimizing the loss function over an ensemble of random noised inputs, we obtain perceptually-aligned perturbations that have the noise-invariant property. Second, we employ Perona–Malik (P–M) filter to smooth the derived adversarial perturbations, such that the perceptually-relevant feature of the perturbation is significantly reinforced and the local oscillation of the perturbation is substantially suppressed. Our method can be generally applied to any gradient-based attack method. We carry out extensive experiments under ImageNet dataset for various robust and non-robust models, and the experimental results demonstrate the effectiveness of our method. Particularly, by combining our method with diverse inputs method and momentum iterative fast gradient sign method, we can achieve state-of-the-art performance in terms of fooling the robust models.

[1]  Amir Nazemi,et al.  Potential adversarial samples for white-box attacks , 2019, ArXiv.

[2]  Wenjun Zeng,et al.  Object Detection in Videos by High Quality Object Linking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yuchen Zhang,et al.  Defending against Whitebox Adversarial Attacks via Randomized Discretization , 2019, AISTATS.

[4]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Shujie Cui,et al.  PrivateDL : Privacy‐preserving collaborative deep learning against leakage from gradient sharing , 2020, Int. J. Intell. Syst..

[6]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Tong Zhang,et al.  NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks , 2019, ICML.

[8]  Jie Fu,et al.  Jacobian Adversarially Regularized Networks for Robustness , 2020, ICLR.

[9]  Max A. Viergever,et al.  Efficient and reliable schemes for nonlinear diffusion filtering , 1998, IEEE Trans. Image Process..

[10]  Hui Xiong,et al.  POBA-GA: Perturbation Optimized Black-Box Adversarial Attacks via Genetic Algorithm , 2019, Comput. Secur..

[11]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[12]  Brian McWilliams,et al.  The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.

[13]  Richard Evan Sutanto,et al.  Adversarial Attack Defense Based on the Deep Image Prior Network , 2019 .

[14]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[15]  Senem Velipasalar,et al.  Defense Strategies Against Adversarial Jamming Attacks via Deep Reinforcement Learning , 2020, 2020 54th Annual Conference on Information Sciences and Systems (CISS).

[16]  Pascal Frossard,et al.  Analysis of universal adversarial perturbations , 2017, ArXiv.

[17]  Jin Li,et al.  The security of machine learning in an adversarial setting: A survey , 2019, J. Parallel Distributed Comput..

[18]  Lei Wu,et al.  Understanding and Enhancing the Transferability of Adversarial Examples , 2018, ArXiv.

[19]  Xiaowei Gu,et al.  A self‐adaptive synthetic over‐sampling technique for imbalanced classification , 2019, Int. J. Intell. Syst..

[20]  Xianmin Wang,et al.  Oblivious Transfer for Privacy-Preserving in VANET’s Feature Matching , 2021, IEEE Transactions on Intelligent Transportation Systems.

[21]  Yi Tang,et al.  Privacy preservation for machine learning training and classification based on homomorphic encryption schemes , 2020, Inf. Sci..

[22]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[23]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[25]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[26]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[27]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[29]  R. Venkatesh Babu,et al.  Fast Feature Fool: A data independent approach to universal adversarial perturbations , 2017, BMVC.

[30]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[31]  Dongrui Wu,et al.  White-Box Target Attack for EEG-Based BCI Regression Problems , 2019, ICONIP.

[32]  Jun Zhu,et al.  Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[34]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[36]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[37]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[38]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[39]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[40]  Alan L. Yuille,et al.  Improving Transferability of Adversarial Examples With Input Diversity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[42]  Yulong Wang,et al.  Data-Free Adversarial Perturbations for Practical Black-Box Attack , 2020, PAKDD.

[43]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[44]  Zhibin Hong,et al.  ACFNet: Attentional Class Feature Network for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[46]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.