Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator

Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data. The learning objective of GANs usually minimizes some measure discrepancy, \textit{e.g.}, $f$-divergence~($f$-GANs) or Integral Probability Metric~(Wasserstein GANs). With $f$-divergence as the objective function, the discriminator essentially estimates the density ratio, and the estimated ratio proves useful in further improving the sample quality of the generator. However, how to leverage the information contained in the discriminator of Wasserstein GANs (WGAN) is less explored. In this paper, we introduce the Discriminator Contrastive Divergence, which is well motivated by the property of WGAN's discriminator and the relationship between WGAN and energy-based model. Compared to standard GANs, where the generator is directly utilized to obtain new samples, our method proposes a semi-amortized generation procedure where the samples are produced with the generator's output as an initial state. Then several steps of Langevin dynamics are conducted using the gradient of the discriminator. We demonstrate the benefits of significant improved generation on both synthetic data and several real-world image generation benchmarks.

[1]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[4]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[5]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[6]  Oriol Vinyals,et al.  Learning Implicit Generative Models with the Method of Learned Moments , 2018, ICML.

[7]  Qiang Liu,et al.  Approximate Inference with Amortised MCMC , 2017, ArXiv.

[8]  Lantao Yu,et al.  Lipschitz Generative Adversarial Nets , 2019, ICML.

[9]  Yoshua Bengio,et al.  Boundary-Seeking Generative Adversarial Networks , 2017, ICLR 2017.

[10]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[11]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[12]  Eric Horvitz,et al.  Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting , 2019, DGS@ICLR.

[13]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[14]  Leon Hirsch,et al.  Fundamentals Of Convex Analysis , 2016 .

[15]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[16]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[17]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[18]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[19]  Jason Yosinski,et al.  Metropolis-Hastings Generative Adversarial Networks , 2018, ICML.

[20]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[21]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Masatoshi Uehara,et al.  Generative Adversarial Nets from a Density Ratio Estimation Perspective , 2016, 1610.02920.

[24]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[27]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[28]  Trevor Darrell,et al.  Discriminator Rejection Sampling , 2018, ICLR.

[29]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[30]  Yee Whye Teh,et al.  Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[31]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[32]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[33]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[34]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[35]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[36]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[37]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[38]  Rémi Munos,et al.  Autoregressive Quantile Networks for Generative Modeling , 2018, ICML.

[39]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[40]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[41]  Akinori Tanaka,et al.  Discriminator optimal transport , 2019, NeurIPS.