A Unified Gradient Regularization Family for Adversarial Examples

Adversarial examples are augmented data points generated by imperceptible perturbation of input samples. They have recently drawn much attention with the machine learning and data mining community. Being difficult to distinguish from real examples, such adversarial examples could change the prediction of many of the best learning models including the state-of-the-art deep learning models. Recent attempts have been made to build robust models that take into account adversarial examples. However, these methods can either lead to performance drops or lack mathematical motivations. In this paper, we propose a unified framework to build robust machine learning models against adversarial examples. More specifically, using the unified framework, we develop a family of gradient regularization methods that effectively penalize the gradient of loss function w.r.t. inputs. Our proposed framework is appealing in that it offers a unified view to deal with adversarial examples. It incorporates another recently-proposed perturbation based approach as a special case. In addition, we present some visual effects that reveals semantic meaning in those perturbations, and thus support our regularization method and provide another explanation for generalizability of adversarial examples. By applying this technique to Maxout networks, we conduct a series of experiments and achieve encouraging results on two benchmark datasets. In particular, we attain the best accuracy on MNIST data (without data augmentation) and competitive performance on CIFAR-10 data.

[1]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[2]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[3]  Rasmus Berg Palm,et al.  Prediction as a candidate for learning deep hierarchical models of data , 2012 .

[4]  Razvan Pascanu,et al.  Pylearn2: a machine learning research library , 2013, ArXiv.

[5]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[6]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[7]  Fei Wang,et al.  Sparse Feature Attacks in Adversarial Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[10]  Thomas Brox,et al.  Inverting Convolutional Networks with Convolutional Networks , 2015, ArXiv.

[11]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[12]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[13]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[18]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[19]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pascal Frossard,et al.  Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[21]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[22]  Surya Ganguli,et al.  Analyzing noise in autoencoders and deep networks , 2014, ArXiv.