论文信息 - Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood

While energy-based models (EBMs) exhibit a number of desirable properties, training and sampling on high-dimensional datasets remains challenging. Inspired by recent progress on diffusion probabilistic models, we present a diffusion recovery likelihood method to tractably learn and sample from a sequence of EBMs trained on increasingly noisy versions of a dataset. Each EBM is trained by maximizing the recovery likelihood: the conditional probability of the data at a certain noise level given their noisy versions at a higher noise level. The recovery likelihood objective is more tractable than the marginal likelihood objective, since it only requires MCMC sampling from a relatively concentrated conditional distribution. Moreover, we show that this estimation method is theoretically consistent: it learns the correct conditional and marginal distributions at each noise level, given sufficient data. After training, synthesized images can be generated efficiently by a sampling process that initializes from a spherical Gaussian distribution and progressively samples the conditional distributions at decreasingly lower noise levels. Our method generates high fidelity samples on various image datasets. On unconditional CIFAR-10 our method achieves FID 9.60 and inception score 8.58, superior to the majority of GANs. Moreover, we demonstrate that unlike previous work on EBMs, our long-run MCMC samples from the conditional distributions do not diverge and still represent realistic images, allowing us to accurately estimate the normalized density of data even for high-dimensional datasets.

[1] Tero Karras,et al. Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[2] Andrew Gelman,et al. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[3] M. Girolami,et al. Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[4] Xiao Wang,et al. Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models , 2020, ICLR.

[5] Yang Lu,et al. Learning Generative ConvNets via Multi-grid Modeling and Sampling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Yoshua Bengio,et al. On Tracking The Partition Function , 2011, NIPS.

[7] Yoshua Bengio,et al. Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[8] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[9] Richard Zemel,et al. Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , 2020, ICML.

[10] Pieter Abbeel,et al. Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[11] Yoshua Bengio,et al. Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[12] Tian Han,et al. On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[13] Zhuowen Tu,et al. Introspective Neural Networks for Generative Modeling , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] R. Zemel,et al. Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling , 2020, ICML 2020.

[15] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[16] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[17] Yann LeCun,et al. Energy-based Generative Adversarial Network , 2016, ICLR.

[18] Surya Ganguli,et al. Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net , 2017, NIPS.

[19] Mohammad Norouzi,et al. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[20] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[21] Zhuowen Tu,et al. Introspective Classification with Convolutional Nets , 2017, NIPS.

[22] Jiquan Ngiam,et al. Learning Deep Energy Models , 2011, ICML.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Wei Wei,et al. COCO-GAN: Generation by Parts via Conditional Coordinating , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Pascal Vincent,et al. Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[26] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[27] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[28] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[29] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[30] Andrew M. Dai,et al. Flow Contrastive Estimation of Energy-Based Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Sumohana S. Channappayya,et al. Quality Aware Generative Adversarial Networks , 2019, NeurIPS.

[32] Yoshua Bengio,et al. Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling , 2020, NeurIPS.

[33] Song-Chun Zhu,et al. Cooperative Training of Fast Thinking Initializer and Slow Thinking Solver for Multi-Modal Conditional Learning. , 2019 .

[34] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.

[35] Mark Chen,et al. Distribution Augmentation for Generative Modeling , 2020, ICML.

[36] Aapo Hyvärinen,et al. Neural Empirical Bayes , 2019, J. Mach. Learn. Res..

[37] Michael U. Gutmann,et al. Telescoping Density-Ratio Estimation , 2020, NeurIPS.

[38] Yang Lu,et al. Cooperative Training of Descriptor and Generator Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Yoshua Bengio,et al. Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[40] Tian Han,et al. Joint Training of Variational Auto-Encoder and Latent Energy-Based Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[42] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.