Autoencoding beyond pixels using a learned similarity metric

We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

[1]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[2]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[4]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[5]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Pascal Vincent,et al.  Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[7]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[10]  Cheng Li,et al.  Transferring Landmark Annotations for Cross-Dataset Face Alignment , 2014, ArXiv.

[11]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[12]  Anders Boesen Lindbo Larsen CUDArray: CUDA-based NumPy , 2014 .

[13]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[19]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[20]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[21]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[23]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[24]  Ruslan Salakhutdinov,et al.  Generating Images from Captions with Attention , 2015, ICLR.

[25]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[26]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[27]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Renjie Liao,et al.  Learning to generate images with perceptual similarity metrics , 2015, 2017 IEEE International Conference on Image Processing (ICIP).