Improved Denoising Diffusion Probabilistic Models

Denoising diffusion probabilistic models (DDPM) are a class of generative models which have recently been shown to produce excellent samples. We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code and pre-trained models at https://github.com/openai/ improved-diffusion.

[1]  Aurko Roy,et al.  Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.

[2]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[3]  Ioannis Mitliagkas,et al.  Adversarial score matching and improved sampling for image generation , 2020, ICLR.

[4]  Jianfeng Gao,et al.  Feature Quantization Improves GAN Training , 2020, ICML.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[7]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[8]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[9]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[10]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[11]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[12]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2021, ICLR.

[13]  Rewon Child Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images , 2021, ICLR.

[14]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Pieter Abbeel,et al.  Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[17]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[18]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[19]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[20]  Heiga Zen,et al.  WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.

[21]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[22]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[25]  Surya Ganguli,et al.  Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net , 2017, NIPS.

[26]  Ying Nian Wu,et al.  Learning Energy-Based Models by Diffusion Recovery Likelihood , 2020, ICLR.

[27]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[30]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[31]  Mark Chen,et al.  Scaling Laws for Autoregressive Generative Modeling , 2020, ArXiv.

[32]  Terrance DeVries,et al.  Instance Selection for GANs , 2020, NeurIPS.

[33]  Jaakko Lehtinen,et al.  Improved Precision and Recall Metric for Assessing Generative Models , 2019, NeurIPS.

[34]  Wei Ping,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[35]  Dario Amodei,et al.  An Empirical Model of Large-Batch Training , 2018, ArXiv.

[36]  Pieter Abbeel,et al.  PixelSNAIL: An Improved Autoregressive Generative Model , 2017, ICML.

[37]  Suman V. Ravuri,et al.  Classification Accuracy Score for Conditional Generative Models , 2019, NeurIPS.

[38]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[39]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[40]  Nal Kalchbrenner,et al.  Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling , 2018, ICLR.