Adversarial Images for Variational Autoencoders

We investigate adversarial attacks for autoencoders. We propose a procedure that distorts the input image to mislead the autoencoder in reconstructing a completely different target image. We attack the internal latent representations, attempting to make the adversarial input produce an internal representation as similar as possible as the target's. We find that autoencoders are much more robust to the attack than classifiers: while some examples have tolerably small input distortion, and reasonable similarity to the target image, there is a quasi-linear trade-off between those aims. We report results on MNIST and SVHN datasets, and also test regular deterministic autoencoders, reaching similar conclusions in all cases. Finally, we show that the usual adversarial attack for classifiers, while being much easier, also presents a direct proportion between distortion on the input, and misdirection on the output. That proportionality however is hidden by the normalization of the output, which maps a linear layer into non-linear probabilities.

[1]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[2]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[6]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Shin Ishii,et al.  Distributional Smoothing by Virtual Adversarial Examples , 2015, ICLR.

[9]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[10]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[11]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[12]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[14]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[15]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[16]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[17]  Eduardo Valle,et al.  Exploring the space of adversarial images , 2015, 2016 International Joint Conference on Neural Networks (IJCNN).

[18]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[19]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[20]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[23]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[24]  Colin Raffel,et al.  Lasagne: First release. , 2015 .