论文信息 - Towards Safe Deep Learning: Unsupervised Defense Against Generic Adversarial Attacks

Towards Safe Deep Learning: Unsupervised Defense Against Generic Adversarial Attacks

Recent advances in adversarial Deep Learning (DL) have opened up a new and largely unexplored surface for malicious attacks jeopardizing the integrity of autonomous DL systems. We introduce a novel automated countermeasure called Parallel Checkpointing Learners (PCL) to thwart the potential adversarial attacks and significantly improve the reliability (safety) of a victim DL model. The proposed PCL methodology is unsupervised, meaning that no adversarial sample is leveraged to build/train parallel checkpointing learners. We formalize the goal of preventing adversarial attacks as an optimization problem to minimize the rarely observed regions in the latent feature space spanned by a DL network. To solve the aforementioned minimization problem, a set of complementary but disjoint checkpointing modules are trained and leveraged to validate the victim model execution in parallel. Each checkpointing learner explicitly characterizes the geometry of the input data and the corresponding high-level data abstractions within a particular DL layer. As such, the adversary is required to simultaneously deceive all the defender modules in order to succeed. We extensively evaluate the performance of the PCL methodology against the state-of-the-art attack scenarios, including Fast-Gradient-Sign (FGS), Jacobian Saliency Map Attack (JSMA), Deepfool, and Carlini&WagnerL2 algorithm. Extensive proof-of-concept evaluations for analyzing various data collections including MNIST, CIFAR10, and ImageNet corroborate the effectiveness of our proposed defense mechanism against adversarial samples.

Tara Javidi | Farinaz Koushanfar | Bita Darvish Rouhani | Mohammad Samragh

[1] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[2] Joel A. Tropp,et al. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[3] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4] David A. Wagner,et al. Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[5] Patrick D. McDaniel,et al. Machine Learning in Adversarial Settings , 2016, IEEE Security & Privacy.

[6] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[7] Farinaz Koushanfar,et al. Perform-ML: Performance optimized machine learning by platform and content aware customization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[8] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[9] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[10] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .

[12] Dale Schuurmans,et al. Learning with a Strong Adversary , 2015, ArXiv.

[13] Farinaz Koushanfar,et al. Deep3: Leveraging three levels of parallelism for efficient Deep Learning , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14] Patrick D. McDaniel,et al. Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[15] Jack W. Stokes,et al. Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[17] David A. Wagner,et al. MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples , 2017, ArXiv.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20] Luca Rigazio,et al. Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[21] Shin Ishii,et al. Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[22] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[23] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[24] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25] Uri Shaham,et al. Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[26] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[27] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Eugenio Culurciello,et al. Robust Convolutional Neural Networks under Adversarial Noise , 2015, ArXiv.