Diffusion Models in Vision: A Survey

Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e. low speeds due to the high number of steps involved during sampling. In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. First, we identify and present three generic diffusion modeling frameworks, which are based on denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. We further discuss the relations between diffusion models and other deep generative models, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing flows. Then, we introduce a multi-perspective categorization of diffusion models applied in computer vision. Finally, we illustrate the current limitations of diffusion models and envision some interesting directions for future research.

[1]  R. Legenstein,et al.  Restoring Vision in Adverse Weather Conditions With Patch-Based Denoising Diffusion Models , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Dong Huk Park,et al.  More Control for Free! Image Synthesis with Semantic Diffusion Guidance , 2021, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[3]  David J. Fleet,et al.  Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  B. Ommer,et al.  Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models , 2022, ArXiv.

[5]  Jonathan Ho Classifier-Free Diffusion Guidance , 2022, ArXiv.

[6]  Alison Q. O'Neil,et al.  What is Healthy? Generative Counterfactual Diffusion for Lesion Localization , 2022, DGM4MICCAI@MICCAI.

[7]  C. Schonlieb,et al.  Non-Uniform Diffusion Models , 2022, ArXiv.

[8]  Onat Dalmaz,et al.  Unsupervised Medical Image Translation With Adversarial Diffusion Models , 2022, IEEE Transactions on Medical Imaging.

[9]  Dong Chen,et al.  Semantic Image Synthesis via Diffusion Models , 2022, ArXiv.

[10]  D. Samaras,et al.  Diffusion models as plug-and-play priors , 2022, NeurIPS.

[11]  Stefan Bauer,et al.  Diffusion Models for Video Prediction and Infilling , 2022, Trans. Mach. Learn. Res..

[12]  Jingfeng Zhang,et al.  Accelerating Score-based Generative Models for High-Resolution Image Synthesis , 2022, ArXiv.

[13]  S. Ourselin,et al.  Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models , 2022, MICCAI.

[14]  Dani Lischinski,et al.  Blended Latent Diffusion , 2022, ArXiv.

[15]  Mingyuan Zhou,et al.  Diffusion-GAN: Training GANs with Diffusion , 2022, ICLR.

[16]  J. Tenenbaum,et al.  Compositional Visual Generation with Composable Diffusion Models , 2022, ECCV.

[17]  Cheng Lu,et al.  DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[18]  Chris G. Willcocks,et al.  AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[20]  Nan Duan,et al.  DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder , 2022, ArXiv.

[21]  Jakub M. Tomczak,et al.  On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models , 2022, NeurIPS.

[22]  Chen Change Loy,et al.  Text2Human , 2022, ACM Trans. Graph..

[23]  O. Winther,et al.  Few-Shot Diffusion Models , 2022, ArXiv.

[24]  Valentin De Bortoli,et al.  A Continuous Time Framework for Discrete Denoising Models , 2022, NeurIPS.

[25]  Se Jung Kwon,et al.  Maximum Likelihood Training of Implicit Nonlinear Diffusion Models , 2022, NeurIPS.

[26]  Fang Wen,et al.  Pretraining is All You Need for Image-to-Image Translation , 2022, ArXiv.

[27]  Dahua Lin,et al.  Accelerating Diffusion Models via Early Stop of the Diffusion Process , 2022, ArXiv.

[28]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[29]  Frank Wood,et al.  Flexible Diffusion Modeling of Long Videos , 2022, NeurIPS.

[30]  Anima Anandkumar,et al.  Diffusion Models for Adversarial Purification , 2022, ICML.

[31]  Balaji Krishnamurthy,et al.  On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models , 2022, ArXiv.

[32]  Bowen Jing,et al.  Subspace Diffusion Generative Models , 2022, ECCV.

[33]  Yongxin Chen,et al.  Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.

[34]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[35]  David J. Fleet,et al.  Video Diffusion Models , 2022, NeurIPS.

[36]  P. Cattin,et al.  The Swiss Army Knife for Image-to-Image Translation: Multi-Task Diffusion Models , 2022, ArXiv.

[37]  Hyunwoo J. Kim,et al.  Perception Prioritized Training of Diffusion Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Cristian Canton Ferrer,et al.  Generating High Fidelity Data from Low-density Regions using Diffusion Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  F. Jurie,et al.  Diffusion Models for Counterfactual Explanations , 2022, ACCV.

[40]  Bo Wun Cheng,et al.  Denoising Likelihood Score Matching for Conditional Score-based Data Generation , 2022, ICLR.

[41]  S. Mandt,et al.  Diffusion Probabilistic Modeling for Video Generation , 2022, Entropy.

[42]  P. Cattin,et al.  Diffusion Models for Medical Anomaly Detection , 2022, MICCAI.

[43]  Lior Wolf,et al.  Dynamic Dual-Output Diffusion Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Valentin De Bortoli,et al.  Conditional Simulation Using Diffusion Schrödinger Bridges , 2022, UAI.

[45]  S. Tsaftaris,et al.  Diffusion Causal Models for Counterfactual Estimation , 2022, CLeaR.

[46]  Yi Ren,et al.  Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.

[47]  I. Oseledets,et al.  Understanding DDPM Latent Codes Through Optimal Transport , 2022, ICLR.

[48]  Radu Tudor Ionescu,et al.  Discriminability-enforcing loss to improve representation learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49]  Mohammad Norouzi,et al.  Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality , 2022, ICLR.

[50]  Tim Salimans,et al.  Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[51]  Michael Elad,et al.  Denoising Diffusion Restoration Models , 2022, NeurIPS.

[52]  Yuankai K. Tao,et al.  Unsupervised denoising of retinal OCT with diffusion probabilistic model , 2022, Medical Imaging.

[53]  L. Gool,et al.  RePaint: Inpainting using Denoising Diffusion Probabilistic Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Bo Zhang,et al.  Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models , 2022, ICLR.

[55]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[57]  Karsten Kreis,et al.  Tackling the Generative Learning Trilemma with Denoising Diffusion GANs , 2021, ICLR.

[58]  Karsten Kreis,et al.  Score-Based Generative Modeling with Critically-Damped Langevin Diffusion , 2021, ICLR.

[59]  Jong-Chul Ye,et al.  Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Diego de Las Casas,et al.  Improving language models by retrieving from trillions of tokens , 2021, ICML.

[61]  Philippe C. Cattin,et al.  Diffusion Models for Implicit Image Segmentation Ensembles , 2021, MIDL.

[62]  A. Voynov,et al.  Label-Efficient Semantic Segmentation with Diffusion Models , 2021, ICLR.

[63]  Tat-Jen Cham,et al.  Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Supasorn Suwajanakorn,et al.  Diffusion Autoencoders: Toward a Meaningful and Decodable Representation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Fang Wen,et al.  Vector Quantized Diffusion Model for Text-to-Image Synthesis , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  D. Lischinski,et al.  Blended Diffusion for Text-driven Editing of Natural Images , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Toby P. Breckon,et al.  Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes , 2021, ECCV.

[68]  S. Ermon,et al.  Solving Inverse Problems in Medical Imaging with Score-Based Generative Models , 2021, ICLR.

[69]  David J. Fleet,et al.  Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.

[70]  Jong-Chul Ye,et al.  Score-based diffusion models for accelerated MRI , 2021, Medical Image Anal..

[71]  Jong-Chul Ye,et al.  DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  S. Ermon,et al.  SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations , 2021, ICLR.

[73]  David J. Fleet,et al.  Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..

[74]  Bin Liu,et al.  VQBB: Image-to-image Translation with Vector Quantized Brownian Bridge , 2022, ArXiv.

[75]  B. Ommer,et al.  Retrieval-Augmented Diffusion Models , 2022, NeurIPS.

[76]  Mingyuan Zhou,et al.  Truncated Diffusion Probabilistic Models , 2022, ArXiv.

[77]  N. Simidjievski,et al.  Heavy-tailed denoising score matching , 2021, ArXiv.

[78]  Pascal Vincent,et al.  High Fidelity Visualization of What Your Self-Supervised Representation Knows About , 2021, Trans. Mach. Learn. Res..

[79]  Lior Wolf,et al.  SegDiff: Image Segmentation with Diffusion Probabilistic Models , 2021, ArXiv.

[80]  Christian Etmann,et al.  Conditional Image Generation with Score-Based Diffusion Models , 2021, ArXiv.

[81]  Yongxin Chen,et al.  Diffusion Normalizing Flow , 2021, NeurIPS.

[82]  Tyler Maunu,et al.  Score-based Generative Neural Networks for Large-Scale Optimal Transport , 2021, NeurIPS.

[83]  Lukas Schott,et al.  Score-Based Generative Classifiers , 2021, ArXiv.

[84]  Dan Su,et al.  Bilateral Denoising Diffusion Models , 2021, ArXiv.

[85]  Andreas Blattmann,et al.  ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis , 2021, NeurIPS.

[86]  Youngjune Gwon,et al.  ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[87]  Rianne van den Berg,et al.  Structured Denoising Diffusion Models in Discrete State-Spaces , 2021, NeurIPS.

[88]  Diederik P. Kingma,et al.  Variational Diffusion Models , 2021, ArXiv.

[89]  Gefei Wang,et al.  Deep Generative Learning via Schrödinger Bridge , 2021, ICML.

[90]  Eliya Nachmani,et al.  Non Gaussian Denoising Diffusion Models , 2021, ArXiv.

[91]  Stefano Ermon,et al.  D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation , 2021, NeurIPS.

[92]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[93]  Mohammad Norouzi,et al.  Learning to Efficiently Sample from Diffusion Probabilistic Models , 2021, ArXiv.

[94]  Aaron C. Courville,et al.  A Variational Perspective on Diffusion-Based Generative Models and Score Matching , 2021, NeurIPS.

[95]  Valentin De Bortoli,et al.  Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling , 2021, NeurIPS.

[96]  Zhifeng Kong,et al.  On Fast Sampling of Diffusion Probabilistic Models , 2021, ArXiv.

[97]  Michael Elad,et al.  SNIPS: Solving Noisy Inverse Problems Stochastically , 2021, NeurIPS.

[98]  Tal Kachman,et al.  Gotta Go Fast When Generating Data with Score-Based Models , 2021, ArXiv.

[99]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[100]  Chris G. Willcocks,et al.  UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models , 2021, ArXiv.

[101]  Jiajun Wu,et al.  3D Shape Generation and Completion through Point-Voxel Diffusion , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[102]  Lior Wolf,et al.  Noise Estimation for Generative Diffusion Models , 2021, ArXiv.

[103]  Shitong Luo,et al.  Diffusion Probabilistic Models for 3D Point Cloud Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[104]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[105]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[106]  Iain Murray,et al.  Maximum Likelihood Training of Score-Based Diffusion Models , 2021, NeurIPS.

[107]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[108]  Ying Nian Wu,et al.  Learning Energy-Based Models by Diffusion Recovery Likelihood , 2020, ICLR.

[109]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[110]  Fan Bao,et al.  Variational (Gradient) Estimate of the Score Function in Energy-based Latent Variable Models , 2020, ICML.

[111]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[112]  Ioannis Mitliagkas,et al.  Adversarial score matching and improved sampling for image generation , 2020, ICLR.

[113]  Eero P. Simoncelli,et al.  Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser , 2021, NeurIPS.

[114]  Kushagra Pandey VAEs meet Diffusion Models: Efficient and High-Fidelity Generation , 2021 .

[115]  Hideyuki Tachibana,et al.  Itô-Taylor Sampling Scheme for Denoising Diffusion Probabilistic Models using Ideal Derivatives , 2021, ArXiv.

[116]  Boah Kim,et al.  DiffuseMorph: Unsupervised Deformable Image Registration Along Continuous Trajectory Using Diffusion Models , 2021, ArXiv.

[117]  Soroosh Mariooryad,et al.  Non-saturating GAN training as divergence minimization , 2020, ArXiv.

[118]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[119]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[120]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[121]  Nikola B. Kovachki,et al.  Conditional Sampling With Monotone GANs , 2020, ArXiv.

[122]  Michael Moeller,et al.  Inverting Gradients - How easy is it to break privacy in federated learning? , 2020, NeurIPS.

[123]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[124]  Ipek Oguz,et al.  Self-fusion for OCT noise reduction , 2020, Medical Imaging: Image Processing.

[125]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[126]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[127]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[128]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[129]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[130]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[131]  Ling Shao,et al.  Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[132]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[133]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[134]  Mert R. Sabuncu,et al.  An Unsupervised Learning Model for Deformable Medical Image Registration , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[135]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[136]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[137]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[138]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[139]  Diederik P. Kingma,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[140]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[141]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[142]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[143]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[144]  Lucas Theis,et al.  Amortised MAP Inference for Image Super-resolution , 2016, ICLR.

[145]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[146]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[147]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[148]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[149]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[150]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[151]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[152]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[153]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[154]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[155]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[156]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[157]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[158]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[159]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[160]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[161]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[162]  B. Anderson Reverse-time diffusion equation models , 1982 .

[163]  W. Feller On the Theory of Stochastic Processes, with Particular Reference to Applications , 1949 .