On the Detection of Adversarial Attacks against Deep Neural Networks

Deep learning model has been widely studied and proven to achieve high accuracy in various pattern recognition tasks, especially in image recognition. However, due to its non-linear architecture and high-dimensional inputs, its ill-posedness [1] towards adversarial perturbations-small deliberately crafted perturbations on the input will lead to completely different outputs, has also attracted researchers' attention. This work takes the traffic sign recognition system on the self-driving car as an example, and aims at designing an additional mechanism to improve the robustness of the recognition system. It uses a machine learning model which learns the results of the deep learning model's predictions, with human feedback as labels and provides the credibility of current prediction. The mechanism makes use of both the input image and the recognition result as sample space, querying a human user the True/False of current classification result the least number of times, and completing the task of detecting adversarial attacks.

[1]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[2]  Luc Van Gool,et al.  Multi-view traffic sign detection, recognition, and 3D localisation , 2014, Machine Vision and Applications.

[3]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4]  Quanyan Zhu,et al.  A game-theoretic defense against data poisoning attacks in distributed support vector machines , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[5]  Rui Zhang,et al.  Secure and resilient distributed machine learning under adversarial environments , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[6]  Rui Zhang,et al.  A game-theoretic analysis of label flipping attacks on distributed support vector machines , 2017, 2017 51st Annual Conference on Information Sciences and Systems (CISS).

[7]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[8]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[9]  Heike Freud,et al.  On Line Learning In Neural Networks , 2016 .

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[13]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.