Progressive Distillation for Fast Sampling of Diffusion Models

Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.

[1]  David J. Fleet,et al.  Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..

[2]  Qi Li,et al.  SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models , 2021, Neurocomputing.

[3]  David J. Fleet,et al.  Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dan Su,et al.  Bilateral Denoising Diffusion Models , 2021, ArXiv.

[5]  Jonathan Ho,et al.  Structured Denoising Diffusion Models in Discrete State-Spaces , 2021, ArXiv.

[6]  Diederik P. Kingma,et al.  Variational Diffusion Models , 2021, ArXiv.

[7]  Eliya Nachmani,et al.  Non Gaussian Denoising Diffusion Models , 2021, ArXiv.

[8]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[9]  Mohammad Norouzi,et al.  Learning to Efficiently Sample from Diffusion Probabilistic Models , 2021, ArXiv.

[10]  Zhifeng Kong,et al.  On Fast Sampling of Diffusion Probabilistic Models , 2021, ArXiv.

[11]  Tal Kachman,et al.  Gotta Go Fast When Generating Data with Score-Based Models , 2021, ArXiv.

[12]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[13]  Lior Wolf,et al.  Noise Estimation for Generative Diffusion Models , 2021, ArXiv.

[14]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[15]  Iain Murray,et al.  Maximum Likelihood Training of Score-Based Diffusion Models , 2021, NeurIPS.

[16]  Eric Luhman,et al.  Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed , 2021, ArXiv.

[17]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[18]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[19]  Heiga Zen,et al.  WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.

[20]  Didrik Nielsen,et al.  Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models , 2021, ArXiv.

[21]  Noah Snavely,et al.  Learning Gradient Fields for Shape Generation , 2020, ECCV.

[22]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[23]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[24]  Tie-Yan Liu,et al.  Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator , 2020, ArXiv.

[25]  Stefano Ermon,et al.  Permutation Invariant Graph Generation via Score-Based Generative Modeling , 2020, AISTATS.

[26]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[27]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[28]  Maxim Raginsky,et al.  Theoretical guarantees for sampling and inference in generative models with latent diffusions , 2019, COLT.

[29]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[30]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[31]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[32]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[33]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[34]  Thomas S. Huang,et al.  Fast Generation for Convolutional Autoregressive Models , 2017, ICLR.

[35]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[36]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.