Hidden Trigger Backdoor Attacks

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time. We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.

[1]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[2]  Claudia Eckert,et al.  Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[3]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[6]  Cristina Nita-Rotaru,et al.  On the Practicality of Integrity Attacks on Document-Level Sentiment Analysis , 2014, AISec '14.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[8]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[9]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[10]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[13]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[14]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[15]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[17]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[18]  Ankur Srivastava,et al.  Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[19]  Brent Lagesse,et al.  Analysis of Causative Attacks against SVMs Learning from Data Streams , 2017, IWSPA@CODASPY.

[20]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[21]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[22]  Aleksander Madry,et al.  Clean-Label Backdoor Attacks , 2018 .

[23]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[24]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[25]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[26]  Tudor Dumitras,et al.  When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks , 2018, USENIX Security Symposium.

[27]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[28]  Bo Li,et al.  Gotta Catch 'Em All: Using Concealed Trapdoors to Detect Adversarial Attacks on Neural Networks , 2019, ArXiv.

[29]  Luis Muñoz-González,et al.  Poisoning Attacks with Generative Adversarial Nets , 2019, ArXiv.

[30]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[31]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[32]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[33]  Tom Goldstein,et al.  Transferable Clean-Label Poisoning Attacks on Deep Neural Nets , 2019, ICML.

[34]  Ben Y. Zhao,et al.  Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks , 2019, CCS.

[35]  Shahbaz Rezaei,et al.  A Target-Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning , 2019, ICLR.

[36]  Sencun Zhu,et al.  Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[37]  Percy Liang,et al.  Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.