Adversarial Attacks on Variational Autoencoders

Adversarial attacks are malicious inputs that derail machine-learning models. We propose a scheme to attack autoencoders, as well as a quantitative evaluation framework that correlates well with the qualitative assessment of the attacks. We assess --- with statistically validated experiments --- the resistance to attacks of three variational autoencoders (simple, convolutional, and DRAW) in three datasets (MNIST, SVHN, CelebA), showing that both DRAW's recurrence and attention mechanism lead to better resistance. As autoencoders are proposed for compressing data --- a scenario in which their safety is paramount --- we expect more attention will be given to adversarial attacks on them.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[4]  Zhe Gan,et al.  Adversarial Symmetric Variational Autoencoder , 2017, NIPS.

[5]  Eduardo Valle,et al.  Exploring the space of adversarial images , 2015, 2016 International Joint Conference on Neural Networks (IJCNN).

[6]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[7]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[9]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[10]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[14]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[17]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[18]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[19]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[21]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[22]  Brendan J. Frey,et al.  PixelGAN Autoencoders , 2017, NIPS.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[25]  Dawn Xiaodong Song,et al.  Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[26]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[27]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[28]  Eduardo Valle,et al.  Adversarial Images for Variational Autoencoders , 2016, ArXiv.

[29]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[30]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[31]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .