Denoising Diffusion Gamma Models

Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[3]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[5]  Chris Donahue,et al.  Adversarial Audio Synthesis , 2018, ICLR.

[6]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[7]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[8]  Jonathan Ho,et al.  Structured Denoising Diffusion Models in Discrete State-Spaces , 2021, ArXiv.

[9]  Mohammad Norouzi,et al.  Learning to Efficiently Sample from Diffusion Probabilistic Models , 2021, ArXiv.

[10]  Aaron C. Courville,et al.  A Variational Perspective on Diffusion-Based Generative Models and Score Matching , 2021, NeurIPS.

[11]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[12]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[13]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[14]  Wei Ping,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[15]  Jonathan Ho,et al.  Variational Diffusion Models , 2021, ArXiv.

[16]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[17]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[18]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  Erich Elsen,et al.  High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.

[20]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Tasnima Sadekova,et al.  Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech , 2021, ICML.

[22]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[23]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[24]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Ying Nian Wu,et al.  Learning Energy-Based Models by Diffusion Recovery Likelihood , 2020, ICLR.

[26]  Youngjune Gwon,et al.  ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[28]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2021, ICLR.

[29]  Dan Su,et al.  Bilateral Denoising Diffusion Models , 2021, ArXiv.

[30]  Heiga Zen,et al.  WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.

[31]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[32]  Lawrence M Leemis,et al.  Univariate Distribution Relationships , 2008 .

[33]  Erich Elsen,et al.  Efficient Neural Audio Synthesis , 2018, ICML.

[34]  Zhifeng Kong,et al.  On Fast Sampling of Diffusion Probabilistic Models , 2021, ArXiv.

[35]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[36]  Didrik Nielsen,et al.  Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models , 2021, ArXiv.