The Tale of Evil Twins: Adversarial Inputs versus Backdoored Models

Despite their tremendous success in a wide range of applications, deep neural network (DNN) models are inherently vulnerable to two types of malicious manipulations: adversarial inputs, which are crafted samples that deceive target DNNs, and backdoored models, which are forged DNNs that misbehave on trigger-embedded inputs. While prior work has intensively studied the two attack vectors in parallel, there is still a lack of understanding about their fundamental connection, which is critical for assessing the holistic vulnerability of DNNs deployed in realistic settings. In this paper, we bridge this gap by conducting the first systematic study of the two attack vectors within a unified framework. More specifically, (i) we develop a new attack model that integrates both adversarial inputs and backdoored models; (ii) with both analytical and empirical evidence, we reveal that there exists an intricate "mutual reinforcement" effect between the two attack vectors; (iii) we demonstrate that this effect enables a large spectrum for the adversary to optimize the attack strategies, such as maximizing attack evasiveness with respect to various defenses and designing trigger patterns satisfying multiple desiderata; (v) finally, we discuss potential countermeasures against this unified attack and their technical challenges, which lead to several promising research directions.

[1]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[2]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[3]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  Wen-Chuan Lee,et al.  NIC: Detecting Adversarial Samples with Neural Network Invariant Checking , 2019, NDSS.

[7]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[8]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[9]  Pascal Frossard,et al.  Analysis of universal adversarial perturbations , 2017, ArXiv.

[10]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[11]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[12]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[13]  A. V. Manzhirov,et al.  Handbook of mathematics for engineers and scientists , 2006 .

[14]  Ting Wang,et al.  Backdoor attacks against learning systems , 2017, 2017 IEEE Conference on Communications and Network Security (CNS).

[15]  Pascal Frossard,et al.  Classification regions of deep neural networks , 2017, ArXiv.

[16]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[17]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[20]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[21]  Ting Wang,et al.  Model-Reuse Attacks on Deep Learning Systems , 2018, CCS.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Robert A. Wijsman A Useful Inequality on Ratios of Integrals, with Application to Maximum Likelihood Estimation , 1985 .

[24]  Tudor Dumitras,et al.  When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks , 2018, USENIX Security Symposium.

[25]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[26]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[28]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[29]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[30]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[31]  Wei You,et al.  Cracking Classifiers for Evasion: A Case Study on the Google's Phishing Pages Filter , 2016, WWW.

[32]  Fabio Roli,et al.  Poisoning Adaptive Biometric Systems , 2012, SSPR/SPR.

[33]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[34]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[35]  Ting Wang,et al.  DEEPSEC: A Uniform Platform for Security Analysis of Deep Learning Model , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[36]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[37]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[38]  Giovanni S. Alberti,et al.  ADef: an Iterative Algorithm to Construct Adversarial Deformations , 2018, ICLR.

[39]  J. Danskin The Theory of Max-Min and its Application to Weapons Allocation Problems , 1967 .

[40]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[41]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[42]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[45]  Junfeng Yang,et al.  Formal Security Analysis of Neural Networks using Symbolic Intervals , 2018, USENIX Security Symposium.

[46]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[47]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[48]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.