HyperNetworks with statistical filtering for defending adversarial examples

Deep learning algorithms have been known to be vulnerable to adversarial perturbations in various tasks such as image classification. This problem was addressed by employing several defense methods for detection and rejection of particular types of attacks. However, training and manipulating networks according to particular defense schemes increases computational complexity of the learning algorithms. In this work, we propose a simple yet effective method to improve robustness of convolutional neural networks (CNNs) to adversarial attacks by using data dependent adaptive convolution kernels. To this end, we propose a new type of HyperNetwork in order to employ statistical properties of input data and features for computation of statistical adaptive maps. Then, we filter convolution weights of CNNs with the learned statistical maps to compute dynamic kernels. Thereby, weights and kernels are collectively optimized for learning of image classification models robust to adversarial attacks without employment of additional target detection and rejection algorithms. We empirically demonstrate that the proposed method enables CNNs to spontaneously defend against different types of attacks, e.g. attacks generated by Gaussian noise, fast gradient sign methods (Goodfellow et al., 2014) and a black-box attack(Narodytska & Kasiviswanathan, 2016).

[1]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[3]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[4]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[5]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[8]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[9]  Eduardo Valle,et al.  Exploring the space of adversarial images , 2015, 2016 International Joint Conference on Neural Networks (IJCNN).

[10]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[16]  Alhussein Fawzi,et al.  A geometric perspective on 
 the robustness of deep networks , 2017 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[19]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[20]  Nina Narodytska,et al.  Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[21]  Eugenio Culurciello,et al.  Robust Convolutional Neural Networks under Adversarial Noise , 2015, ArXiv.

[22]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).