Robust Perception through Analysis by Synthesis

The intriguing susceptibility of deep neural networks to minimal input perturbations suggests that the gap between human and machine perception is still large. We here argue that despite much effort, even on MNIST the most successful defenses are still far away from the robustness of human perception. We here reconsider MNIST and establish a novel defense that is inspired by the abundant feedback connections present in the human visual cortex. We suggest that this feedback plays a role in estimating the likelihood of a sensory stimulus with respect to the hidden causes inferred by the cortex and allow the brain to mute distracting patterns. We implement this analysis by synthesis idea in the form of a discriminatively fine-tuned Bayesian classifier using a set of class-conditional variational auto encoders (VAEs). To evaluate model robustness we go to great length to find maximally effective adversarial attacks, including decision-based, score-based and gradient-based attacks. The results suggest that this ansatz yields state-of-the-art robustness on MNIST against L0, L2 and L∞ perturbations and we demonstrate that most adversarial examples are strongly perturbed towards the perceptual boundary between the original and the adversarial class.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[3]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[6]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7]  Matthias Bethge,et al.  Comment on "Biologically inspired protection of deep networks from adversarial attacks" , 2017, ArXiv.

[8]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[9]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[10]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[11]  Logan Engstrom,et al.  Query-Efficient Black-box Adversarial Examples , 2017, ArXiv.

[12]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  Nicholas Carlini,et al.  On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses , 2018, ArXiv.

[15]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[16]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[17]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[18]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[19]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[20]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[21]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[22]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[24]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[25]  James A. Storer,et al.  Deflecting Adversarial Attacks with Pixel Deflection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[27]  Yongdong Zhang,et al.  APE-GAN: Adversarial Perturbation Elimination with GAN , 2017, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).