Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors

Unconditional image generation has recently been dominated by generative adversarial networks (GANs). GAN methods train a generator which regresses images from random noise vectors, as well as a discriminator that attempts to differentiate between the generated images and a training set of real images. GANs have shown amazing results at generating realistic looking images. Despite their success, GANs suffer from critical drawbacks including: unstable training and mode-dropping. The weaknesses in GANs have motivated research into alternatives including: variational auto-encoders (VAEs), latent embedding learning methods (e.g. GLO) and nearest-neighbor based implicit maximum likelihood estimation (IMLE). Unfortunately at the moment, GANs still significantly outperform the alternative methods for image generation. In this work, we present a novel method - Generative Latent Nearest Neighbors (GLANN) - for training generative models without adversarial training. GLANN combines the strengths of IMLE and GLO in a way that overcomes the main drawbacks of each method. Consequently, GLANN generates images that are far better than GLO and IMLE. Our method does not suffer from mode collapse which plagues GAN training and is much more stable. Qualitative results show that GLANN outperforms a baseline consisting of 800 GANs and VAEs on commonly used datasets. Our models are also shown to be effective for training truly non-adversarial unsupervised image translation.

[1]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[2]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[4]  Yi Zhang,et al.  Do GANs actually learn the distribution? An empirical study , 2017, ArXiv.

[5]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[6]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Lior Wolf,et al.  An Iterative Closest Point Method for Unsupervised Word Translation , 2018, ArXiv.

[8]  Lior Wolf,et al.  NAM: Non-Adversarial Unsupervised Domain Mapping , 2018, ECCV.

[9]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[10]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[11]  Lior Wolf,et al.  NAM - Unsupervised Cross-Domain Image Mapping without Cycles or GANs , 2018, International Conference on Learning Representations.

[12]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[13]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[15]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[16]  Jitendra Malik,et al.  Implicit Maximum Likelihood Estimation , 2018, ArXiv.

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[19]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[20]  David Lopez-Paz,et al.  Optimizing the Latent Space of Generative Networks , 2017, ICML.

[21]  Lior Wolf,et al.  Non-Adversarial Unsupervised Word Translation , 2018, EMNLP.

[22]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[23]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[24]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[25]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[26]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[27]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[28]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[29]  Rishi Sharma,et al.  A Note on the Inception Score , 2018, ArXiv.

[30]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[31]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[32]  Yedid Hoshen,et al.  Non-Adversarial Mapping with VAEs , 2018, NeurIPS.

[33]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[36]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Olivier Bachem,et al.  Assessing Generative Models via Precision and Recall , 2018, NeurIPS.

[38]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[39]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .