A Geometric Perspective on Diffusion Models

Recent years have witnessed significant progress in developing efficient training and fast sampling approaches for diffusion models. A recent remarkable advancement is the use of stochastic differential equations (SDEs) to describe data perturbation and generative modeling in a unified mathematical framework. In this paper, we reveal several intriguing geometric structures of diffusion models and contribute a simple yet powerful interpretation to their sampling dynamics. Through carefully inspecting a popular variance-exploding SDE and its marginal-preserving ordinary differential equation (ODE) for sampling, we discover that the data distribution and the noise distribution are smoothly connected with an explicit, quasi-linear sampling trajectory, and another implicit denoising trajectory, which even converges faster in terms of visual quality. We also establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the score deviation. These new geometric observations enable us to improve previous sampling algorithms, re-examine latent interpolation, as well as re-explain the working principles of distillation-based fast sampling techniques.

[1]  Seung Wook Kim,et al.  Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dian Ang Yap,et al.  TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation , 2023, ArXiv.

[3]  T. Jaakkola,et al.  Stable Target Field for Reduced Variance Score Estimation in Diffusion Models , 2023, ICLR.

[4]  K. Azizzadenesheli,et al.  Fast Sampling of Diffusion Models via Operator Learning , 2022, ICML.

[5]  Ben Poole,et al.  DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[6]  Cheng Lu,et al.  DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[7]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[8]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[9]  Yongxin Chen,et al.  Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.

[10]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[11]  David J. Fleet,et al.  Video Diffusion Models , 2022, NeurIPS.

[12]  Tim Salimans,et al.  Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[13]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Diederik P. Kingma,et al.  Variational Diffusion Models , 2021, ArXiv.

[15]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[16]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[17]  Eric Luhman,et al.  Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed , 2021, ArXiv.

[18]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[19]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[20]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[21]  Bryan Catanzaro,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[22]  Heiga Zen,et al.  WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.

[23]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[24]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[25]  Aapo Hyvärinen,et al.  Neural Empirical Bayes , 2019, J. Mach. Learn. Res..

[26]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[27]  Trevor Hastie,et al.  Computer Age Statistical Inference: Algorithms, Evidence, and Data Science , 2016 .

[28]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[29]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[30]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[32]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[33]  B. Efron Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[34]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[35]  Eero P. Simoncelli,et al.  Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[36]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[37]  Miguel Á. Carreira-Perpiñán,et al.  Gaussian Mean-Shift Is an EM Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[39]  Anton van den Hengel,et al.  Fast global kernel density mode seeking with application to localization and tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[40]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[41]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[44]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[45]  B. Anderson Reverse-time diffusion equation models , 1982 .

[46]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[47]  Hassan K. Khalil,et al.  Nonlinear Systems Third Edition , 2008 .

[48]  Roger Wattenhofer Consistency of models , 1980 .

[49]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[50]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .