Backdooring Convolutional Neural Networks via Targeted Weight Perturbations

We present a new white-box backdoor attack that exploits a vulnerability of convolutional neural networks (CNNs). In particular, we examine the application of facial recognition. Deep learning techniques are at the top of the game for facial recognition, which means they have now been implemented in many production-level systems. Alarmingly, unlike other commercial technologies such as operating systems and network devices, deep learning-based facial recognition algorithms are not presently designed with security requirements or audited for security vulnerabilities before deployment. Given how young the technology is and how abstract many of the internal workings of these algorithms are, neural network-based facial recognition systems are prime targets for security breaches. As more and more of our personal information begins to be guarded by facial recognition (e.g., the iPhone X), exploring the security vulnerabilities of these systems from a penetration testing standpoint is crucial. Along these lines, we describe a general methodology for backdooring CNNs via targeted weight perturbations. Using a five-layer CNN and ResNet-50 as case studies, we show that an attacker is able to significantly increase the chance that inputs they supply will be falsely accepted by a CNN while simultaneously preserving the error rates for legitimate enrolled classes.

[1]  Prateek Saxena,et al.  Auror: defending against poisoning attacks in collaborative deep learning systems , 2016, ACSAC.

[2]  Terrance E. Boult,et al.  A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward Autonomous Open World Solutions , 2016, IEEE Communications Surveys & Tutorials.

[3]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[4]  Iliano Cervesato,et al.  On the Detection of Kernel-Level Rootkits Using Hardware Performance Counters , 2017, AsiaCCS.

[5]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[9]  Terrance E. Boult,et al.  LOTS about attacking deep features , 2016, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[10]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[11]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[12]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[13]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[14]  Bernd Eggers Rootkits Subverting The Windows Kernel , 2016 .

[15]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[16]  Lujo Bauer,et al.  Adversarial Generative Nets: Neural Network Attacks on State-of-the-Art Face Recognition , 2018, ArXiv.

[17]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[18]  Sebastian Schrittwieser,et al.  The Evolution of Process Hiding Techniques in Malware - Current Threats and Possible Countermeasures , 2017, J. Inf. Process..

[19]  Cliff Changchun Zou,et al.  SMM rootkit: a new breed of OS independent malware , 2013, Secur. Commun. Networks.

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Helen J. Wang,et al.  SubVirt: implementing malware with virtual machines , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[22]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Trevor Darrell,et al.  Fooling Vision and Language Models Despite Localization and Attention Mechanism , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[26]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[27]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.