论文信息 - A Geometric Perspective on Diffusion Models

A Geometric Perspective on Diffusion Models

Recent years have witnessed significant progress in developing efficient training and fast sampling approaches for diffusion models. A recent remarkable advancement is the use of stochastic differential equations (SDEs) to describe data perturbation and generative modeling in a unified mathematical framework. In this paper, we reveal several intriguing geometric structures of diffusion models and contribute a simple yet powerful interpretation to their sampling dynamics. Through carefully inspecting a popular variance-exploding SDE and its marginal-preserving ordinary differential equation (ODE) for sampling, we discover that the data distribution and the noise distribution are smoothly connected with an explicit, quasi-linear sampling trajectory, and another implicit denoising trajectory, which even converges faster in terms of visual quality. We also establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the score deviation. These new geometric observations enable us to improve previous sampling algorithms, re-examine latent interpolation, as well as re-explain the working principles of distillation-based fast sampling techniques.

[1] Seung Wook Kim,et al. Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Dian Ang Yap,et al. TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation , 2023, ArXiv.

[3] T. Jaakkola,et al. Stable Target Field for Reduced Variance Score Estimation in Diffusion Models , 2023, ICLR.

[4] K. Azizzadenesheli,et al. Fast Sampling of Diffusion Models via Operator Learning , 2022, ICML.

[5] Ben Poole,et al. DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[6] Cheng Lu,et al. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[7] Tero Karras,et al. Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[8] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[9] Yongxin Chen,et al. Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.

[10] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[11] David J. Fleet,et al. Video Diffusion Models , 2022, NeurIPS.

[12] Tim Salimans,et al. Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[13] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Diederik P. Kingma,et al. Variational Diffusion Models , 2021, ArXiv.

[15] Jan Kautz,et al. Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[16] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[17] Eric Luhman,et al. Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed , 2021, ArXiv.

[18] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[19] Jiaming Song,et al. Denoising Diffusion Implicit Models , 2020, ICLR.

[20] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[21] Bryan Catanzaro,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[22] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.

[23] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[24] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[25] Aapo Hyvärinen,et al. Neural Empirical Bayes , 2019, J. Mach. Learn. Res..

[26] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[27] Trevor Hastie,et al. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science , 2016 .

[28] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[29] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[30] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31] Yoshua Bengio,et al. What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[32] Christian P. Robert,et al. Large-scale inference , 2010 .

[33] B. Efron. Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[34] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[35] Eero P. Simoncelli,et al. Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[36] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[37] Miguel Á. Carreira-Perpiñán,et al. Gaussian Mean-Shift Is an EM Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[39] Anton van den Hengel,et al. Fast global kernel density mode seeking with application to localization and tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[40] J. S. Marron,et al. Geometric representation of high dimension, low sample size data , 2005 .

[41] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42] Yizong Cheng,et al. Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[43] B. Øksendal. Stochastic differential equations : an introduction with applications , 1987 .

[44] C. Morris. Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[45] B. Anderson. Reverse-time diffusion equation models , 1982 .

[46] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[47] Hassan K. Khalil,et al. Nonlinear Systems Third Edition , 2008 .

[48] Roger Wattenhofer. Consistency of models , 1980 .

[49] Larry D. Hostetler,et al. The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[50] H. Robbins. An Empirical Bayes Approach to Statistics , 1956 .