论文信息 - SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems

SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems

SentiNet is a novel detection framework for localized universal attacks on neural networks. These attacks restrict adversarial noise to contiguous portions of an image and are reusable with different images-constraints that prove useful for generating physically-realizable attacks. Unlike most other works on adversarial detection, SentiNet does not require training a model or preknowledge of an attack prior to detection. Our approach is appealing due to the large number of possible mechanisms and attack-vectors that an attack-specific defense would have to consider. By leveraging the neural network's susceptibility to attacks and by using techniques from model interpretability and object detection as detection mechanisms, SentiNet turns a weakness of a model into a strength. We demonstrate the effectiveness of SentiNet on three different attacks-i.e., data poisoning attacks, trojaned networks, and adversarial patches (including physically realizable attacks)-and show that our defense is able to achieve very competitive performance metrics for all three threats. Finally, we show that SentiNet is robust against strong adaptive adversaries, who build adversarial patches that specifically target the components of SentiNet's architecture.

Florian Tramèr | Edward Chou | Giancarlo Pellegrino

[1] Pan He,et al. Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[2] Wen-Chuan Lee,et al. Trojaning Attack on Neural Networks , 2018, NDSS.

[3] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[4] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.

[5] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Weiping Li,et al. Review of Deep Learning , 2018, ArXiv.

[7] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8] Hod Lipson,et al. Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Seiichi Uchida,et al. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data , 2016, PloS one.

[12] Atul Prakash,et al. Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[13] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[14] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[15] Ryan R. Curtin,et al. Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[16] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jan Hendrik Metzen,et al. On Detecting Adversarial Perturbations , 2017, ICLR.

[19] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[20] Yingjie Lao,et al. Hardware Trojan Attacks on Neural Networks , 2018, ArXiv.

[21] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[22] Mohan M. Trivedi,et al. Traffic sign detection for U.S. roads: Remaining challenges and a case for tracking , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[23] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[24] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[25] Sanjay Chawla,et al. Anomaly Detection using One-Class Neural Networks , 2018, ArXiv.

[26] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[27] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Lujo Bauer,et al. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[29] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[30] David A. Forsyth,et al. SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31] Logan Engstrom,et al. Synthesizing Robust Adversarial Examples , 2017, ICML.

[32] Kevin Gimpel,et al. Early Methods for Detecting Adversarial Images , 2016, ICLR.

[33] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[34] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[35] Dan Boneh,et al. Ad-versarial: Defeating Perceptual Ad-Blocking , 2018, ArXiv.

[36] Zhitao Gong,et al. Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[37] Ajmal Mian,et al. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[38] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[39] Vineeth N. Balasubramanian,et al. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40] Aleksander Madry,et al. On Evaluating Adversarial Robustness , 2019, ArXiv.

[41] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[42] Dawn Song,et al. Physical Adversarial Examples for Object Detectors , 2018, WOOT @ USENIX Security Symposium.

[43] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[44] Tudor Dumitras,et al. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[45] Katya Scheinberg,et al. On the convergence of derivative-free methods for unconstrained optimization , 1997 .

[46] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[47] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Martín Abadi,et al. Adversarial Patch , 2017, ArXiv.

[49] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[50] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[51] Nicholas Carlini,et al. Evaluation and Design of Robust Neural Network Defenses , 2018 .

[52] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[53] Ting Wang,et al. Model-Reuse Attacks on Deep Learning Systems , 2018, CCS.

[54] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Yoav Goldberg,et al. LaVAN: Localized and Visible Adversarial Noise , 2018, ICML.