论文信息 - Autoencoding beyond pixels using a learned similarity metric

Autoencoding beyond pixels using a learned similarity metric

We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

[1] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[2] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[4] Alan C. Bovik,et al. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[5] Shree K. Nayar,et al. Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6] Pascal Vincent,et al. Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[7] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[10] Cheng Li,et al. Transferring Landmark Annotations for Cross-Dataset Face Alignment , 2014, ArXiv.

[11] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[12] Anders Boesen Lindbo Larsen. CUDArray: CUDA-based NumPy , 2014 .

[13] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[14] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[15] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[16] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[19] Tapani Raiko,et al. Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[20] Yuting Zhang,et al. Deep Visual Analogy-Making , 2015, NIPS.

[21] Thomas Brox,et al. Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Yann LeCun,et al. Stacked What-Where Auto-encoders , 2015, ArXiv.

[23] Leon A. Gatys,et al. A Neural Algorithm of Artistic Style , 2015, ArXiv.

[24] Ruslan Salakhutdinov,et al. Generating Images from Captions with Attention , 2015, ICLR.

[25] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[26] Honglak Lee,et al. Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[27] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.

[28] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29] Renjie Liao,et al. Learning to generate images with perceptual similarity metrics , 2015, 2017 IEEE International Conference on Image Processing (ICIP).