Fast Sampling of Diffusion Models via Operator Learning

Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method that generates images with only one model forward pass. We propose diffusion model sampling with neural operator (DSNO) that maps the initial condition, i.e., Gaussian distribution, to the continuous-time solution trajectory of the reverse diffusion process. To model the temporal correlations along the trajectory, we introduce temporal convolution layers that are parameterized in the Fourier space into the given diffusion model backbone. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.

[1]  Prafulla Dhariwal,et al.  Consistency Models , 2023, ICML.

[2]  W. Freeman,et al.  Muse: Text-To-Image Generation via Masked Generative Transformers , 2023, ICML.

[3]  William S. Peebles,et al.  Scalable Diffusion Models with Transformers , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  J. Tenenbaum,et al.  Is Conditional Generative Modeling all you need for Decision-Making? , 2022, ICLR.

[5]  Karsten Kreis,et al.  GENIE: Higher-Order Denoising Diffusion Solvers , 2022, NeurIPS.

[6]  Diederik P. Kingma,et al.  On Distillation of Guided Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hang Su,et al.  All are Worth Words: A ViT Backbone for Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  J. Z. Kolter,et al.  (Certified!!) Adversarial Robustness for Free! , 2022, ICLR.

[9]  Bo Zhang,et al.  Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models , 2022, ICML.

[10]  Cheng Lu,et al.  DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[11]  Junyan Zhu,et al.  On Aliased Resizing and Surprising Subtleties in GAN Evaluation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[13]  Anima Anandkumar,et al.  Diffusion Models for Adversarial Purification , 2022, ICML.

[14]  Yongxin Chen,et al.  Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.

[15]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[16]  S. Ermon,et al.  GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation , 2022, ICLR.

[17]  Yi Ren,et al.  Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.

[18]  Mingyuan Zhou,et al.  Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders , 2022, ICLR.

[19]  Mohammad Norouzi,et al.  Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality , 2022, ICLR.

[20]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2022, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT).

[21]  Tim Salimans,et al.  Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[22]  Bo Zhang,et al.  Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models , 2022, ICLR.

[23]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Karsten Kreis,et al.  Tackling the Generative Learning Trilemma with Denoising Diffusion GANs , 2021, ICLR.

[25]  Dan Su,et al.  Bilateral Denoising Diffusion Models , 2021, ArXiv.

[26]  Nikola B. Kovachki,et al.  Neural Operator: Learning Maps Between Function Spaces , 2021, ArXiv.

[27]  Kamyar Azizzadenesheli,et al.  Seismic wave propagation and inversion with Neural Operators , 2021, The Seismic Record.

[28]  Siddhartha Mishra,et al.  On universal approximation and error bounds for Fourier Neural Operators , 2021, J. Mach. Learn. Res..

[29]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[30]  Zhifeng Kong,et al.  On Fast Sampling of Diffusion Probabilistic Models , 2021, ArXiv.

[31]  Tasnima Sadekova,et al.  Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech , 2021, ICML.

[32]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[33]  Eric Luhman,et al.  Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed , 2021, ArXiv.

[34]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[35]  Nikola B. Kovachki,et al.  Fourier Neural Operator for Parametric Partial Differential Equations , 2020, ICLR.

[36]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[37]  Bryan Catanzaro,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[38]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[39]  Kamyar Azizzadenesheli,et al.  Neural Operator: Graph Kernel Network for Partial Differential Equations , 2020, ICLR 2020.

[40]  Jaakko Lehtinen,et al.  Improved Precision and Recall Metric for Assessing Generative Models , 2019, NeurIPS.

[41]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[42]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[43]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  K. Azizzadenesheli,et al.  Accelerating Carbon Capture and Storage Modeling using Fourier Neural Operators , 2022, ArXiv.

[49]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .