Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many security-sensitive applications like payment apps. Such usages of deep learning systems provide the adversaries with sufficient incentives to perform attacks against these systems for their adversarial purposes. In this work, we consider a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor. Specifically, the adversary aims at creating backdoor instances, so that the victim learning system will be misled to classify the backdoor instances as a target label specified by the adversary. In particular, we study backdoor poisoning attacks, which achieve backdoor attacks using poisoning strategies. Different from all existing work, our studied poisoning strategies can apply under a very weak threat model: (1) the adversary has no knowledge of the model and the training set used by the victim system; (2) the attacker is allowed to inject only a small amount of poisoning samples; (3) the backdoor key is hard to notice even by human beings to achieve stealthiness. We conduct evaluation to demonstrate that a backdoor adversary can inject only around 50 poisoning samples, while achieving an attack success rate of above 90%. We are also the first work to show that a data poisoning attack can create physically implementable backdoors without touching the training process. Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system, and thus highlights the importance of further investigation and proposing defense strategies against them.

[1]  Moti Yung,et al.  Backdoor Attacks on Black-Box Ciphers Exploiting Low-Entropy Plaintexts , 2003, ACISP.

[2]  C. P. Ravikumar,et al.  A self-checking signature scheme for checking backdoor security attacks in Internet , 2004, J. High Speed Networks.

[3]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[4]  Aristidis Likas,et al.  Deep Belief Networks for Spam Filtering , 2007 .

[5]  Michael G. Strintzis,et al.  Face Recognition , 2008, Encyclopedia of Multimedia.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Nguyen Minh Duc Your face is NOT your password Face Authentication ByPassing Lenovo – Asus – Toshiba , 2009 .

[8]  Ying Tan,et al.  A three-layer back-propagation neural network for spam detection using artificial immune concentration , 2009, Soft Comput..

[9]  Xiaoying Qi Face , 2011, Definitions.

[10]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[11]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[12]  Fabio Roli,et al.  Poisoning Adaptive Biometric Systems , 2012, SSPR/SPR.

[13]  Fabio Roli,et al.  Poisoning attacks to compromise face templates , 2013, 2013 International Conference on Biometrics (ICB).

[14]  B. Prabhakaran,et al.  Facilitating fashion camouflage art , 2013, ACM Multimedia.

[15]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Sébastien Marcel,et al.  Spoofing in 2D face recognition with 3D masks and anti-spoofing with Kinect , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[17]  Ling Huang,et al.  SAFE: Secure authentication with Face and Eyes , 2013, 2013 International Conference on Privacy and Security in Mobile Systems (PRISMS).

[18]  Shie Mannor,et al.  Robust High Dimensional Sparse Regression and Matching Pursuit , 2013, ArXiv.

[19]  Shie Mannor,et al.  Robust Logistic Regression and Classification , 2014, NIPS.

[20]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Congying Han,et al.  Fingerprint Classification Based on Depth Neural Network , 2014, ArXiv.

[22]  Robert H. Deng,et al.  Understanding OSN-based facial disclosure against face authentication systems , 2014, AsiaCCS.

[23]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yuning Jiang,et al.  Learning Deep Face Representation , 2014, ArXiv.

[25]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[26]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[27]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[28]  Xiaojin Zhu,et al.  The Security of Latent Dirichlet Allocation , 2015, AISTATS.

[29]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Marshall Copeland,et al.  Microsoft Azure , 2015, Apress.

[31]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[32]  Qi Yin,et al.  Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? , 2015, ArXiv.

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Jaehyun So,et al.  Detecting trigger-based behaviors in botnet malware , 2015, RACS.

[36]  Chang Liu,et al.  Robust High-Dimensional Linear Regression , 2016, ArXiv.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[39]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[40]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[41]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[42]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[43]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[44]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[45]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[46]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[47]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yiran Chen,et al.  Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[49]  Atul Prakash,et al.  Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[50]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[51]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[52]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[53]  Chang Liu,et al.  Robust Linear Regression Against Training Data Poisoning , 2017, AISec@CCS.

[54]  Ankur Srivastava,et al.  Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[55]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[56]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[57]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[58]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[59]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[60]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.