Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks

Transferability captures the ability of an attack against a machine-learning model to be effective against a different, potentially unknown, model. Empirical evidence for transferability has been shown in previous work, but the underlying reasons why an attack transfers or not are not yet well understood. In this paper, we present a comprehensive analysis aimed to investigate the transferability of both test-time evasion and training-time poisoning attacks. We provide a unifying optimization framework for evasion and poisoning attacks, and a formal definition of transferability of such attacks. We highlight two main factors contributing to attack transferability: the intrinsic adversarial vulnerability of the target model, and the complexity of the surrogate model used to optimize the attack. Based on these insights, we define three metrics that impact an attack's transferability. Interestingly, our results derived from theoretical analysis hold for both evasion and poisoning attacks, and are confirmed experimentally using a wide range of linear and non-linear classifiers and datasets.

[1]  Andy Adler,et al.  Vulnerabilities in Biometric Encryption Systems , 2005, AVBPA.

[2]  Wenke Lee,et al.  Misleading worm signature generators using deliberate noise injection , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[3]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[4]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[7]  Ling Huang,et al.  ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[8]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[11]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[12]  Cristina Nita-Rotaru,et al.  On the Practicality of Integrity Attacks on Document-Level Sentiment Analysis , 2014, AISec '14.

[13]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[14]  Dan Arp,et al.  Drebin : � Efficient and Explainable Detection of Android Malware in Your Pocket , 2014 .

[15]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[16]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[17]  Fabio Roli,et al.  Poisoning Complete-Linkage Hierarchical Clustering , 2014, S+SSPR.

[18]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[19]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[20]  Kaizhu Huang,et al.  A Unified Gradient Regularization Family for Adversarial Examples , 2015, 2015 IEEE International Conference on Data Mining.

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22]  Fabio Roli,et al.  On Security and Sparsity of Linear Classifiers for Adversarial Settings , 2016, S+SSPR.

[23]  Patrick P. K. Chan,et al.  Adversarial Feature Selection Against Evasion Attacks , 2016, IEEE Transactions on Cybernetics.

[24]  J. Doug Tygar,et al.  Evasion and Hardening of Tree Ensemble Classifiers , 2015, ICML.

[25]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[26]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[27]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[28]  Fabio Roli,et al.  Secure Kernel Machines against Evasion Attacks , 2016, AISec@CCS.

[29]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[30]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[31]  Patrick D. McDaniel,et al.  Adversarial Examples for Malware Detection , 2017, ESORICS.

[32]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[34]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[35]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[36]  Dániel Varga,et al.  Gradient Regularization Improves Accuracy of Discriminative Models , 2017, ArXiv.

[37]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[38]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[39]  Hung Dang,et al.  Evading Classifiers by Morphing in the Dark , 2017, CCS.

[40]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[41]  Gavin Brown,et al.  Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[42]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[43]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[44]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[45]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[46]  Bernhard Schölkopf,et al.  Adversarial Vulnerability of Neural Networks Increases With Input Dimension , 2018, ArXiv.

[47]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[48]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[49]  Tudor Dumitras,et al.  When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks , 2018, USENIX Security Symposium.

[50]  Zhanxing Zhu,et al.  Enhancing the Transferability of Adversarial Examples with Noise Reduced Gradient , 2018 .

[51]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[52]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[53]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[54]  Fabio Roli,et al.  Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection , 2017, IEEE Transactions on Dependable and Secure Computing.