Maximum Entropy Generators for Energy-Based Models

Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient. In this work, we propose learning both the energy function and an amortized approximate sampling mechanism using a neural generator network, which provides an efficient approximation of the log-likelihood gradient. The resulting objective requires maximizing entropy of the generated samples, which we perform using recently proposed nonparametric mutual information estimators. Finally, to stabilize the resulting adversarial game, we use a zero-centered gradient penalty derived as a necessary condition from the score matching literature. The proposed technique can generate sharp images with Inception and FID scores competitive with recent GAN techniques, does not suffer from mode collapse, and is competitive with state-of-the-art anomaly detection techniques.

[1]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[2]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[3]  Philip Bachman,et al.  Calibrating Energy-based Generative Adversarial Networks , 2017, ICLR.

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Aäron van den Oord,et al.  On variational lower bounds of mutual information , 2018 .

[6]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[7]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[8]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[11]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[12]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[13]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[14]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[15]  Ian J. Goodfellow,et al.  On distinguishability criteria for estimating generative models , 2014, ICLR.

[16]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[17]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[18]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[19]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[20]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[21]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[22]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[23]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[24]  Yu Cheng,et al.  Deep Structured Energy Based Models for Anomaly Detection , 2016, ICML.

[25]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[26]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[27]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[28]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[29]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[30]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[31]  Chuan Sheng Foo,et al.  Efficient GAN-Based Anomaly Detection , 2018, ArXiv.

[32]  Terrence J. Sejnowski,et al.  Blind separation and blind deconvolution: an information-theoretic approach , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[33]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[34]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[35]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[36]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[37]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[38]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[39]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[42]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[43]  Yoshua Bengio,et al.  Learning Independent Features with Adversarial Nets for Non-linear ICA , 2017, 1710.05050.

[44]  Charles A. Sutton,et al.  VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning , 2017, NIPS.

[45]  Ashish Khetan,et al.  PacGAN: The Power of Two Samples in Generative Adversarial Networks , 2017, IEEE Journal on Selected Areas in Information Theory.

[46]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[47]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .