SinDDM: A Single Image Denoising Diffusion Model

Denoising diffusion models (DDMs) have led to staggering performance leaps in image generation, editing and restoration. However, existing DDMs use very large datasets for training. Here, we introduce a framework for training a DDM on a single image. Our method, which we coin SinDDM, learns the internal statistics of the training image by using a multi-scale diffusion process. To drive the reverse diffusion process, we use a fully-convolutional light-weight denoiser, which is conditioned on both the noise level and the scale. This architecture allows generating samples of arbitrary dimensions, in a coarse-to-fine manner. As we illustrate, SinDDM generates diverse high-quality samples, and is applicable in a wide array of tasks, including style transfer and harmonization. Furthermore, it can be easily guided by external supervision. Particularly, we demonstrate text-guided generation from a single image using a pre-trained CLIP model.

[1]  Dong Chen,et al.  SinDiffusion: Learning a Diffusion Model from a Single Natural Image , 2022, ArXiv.

[2]  M. Irani,et al.  SinFusion: Training Diffusion Models on a Single Image or Video , 2022, ICML.

[3]  David J. Fleet,et al.  Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.

[4]  Changxi Zheng,et al.  Learning to Generate 3D Shapes from a Single Example , 2022, ACM Trans. Graph..

[5]  Amit H. Bermano,et al.  An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion , 2022, ICLR.

[6]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[7]  E. Shechtman,et al.  Any-resolution Training for High-resolution Image Synthesis , 2022, ECCV.

[8]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[9]  Tali Dekel,et al.  Text2LIVE: Text-Driven Layered Image and Video Editing , 2022, ECCV.

[10]  Yair Weiss,et al.  Generating natural images with direct Patch Distributions Matching , 2022, ECCV.

[11]  Andreas Geiger,et al.  StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets , 2022, SIGGRAPH.

[12]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[14]  David J. Fleet,et al.  Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.

[15]  Peyman Milanfar,et al.  MUSIQ: Multi-scale Image Quality Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  S. Ermon,et al.  SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations , 2021, ICLR.

[17]  Qifeng Chen,et al.  SinIR: Efficient General Image Manipulation with Single Image Reconstruction , 2021, ICML.

[18]  Tamar Rott Shaham,et al.  Catch-A-Waveform: Learning to Generate Audio from a Single Short Example , 2021, NeurIPS.

[19]  Ping Li,et al.  Patchwise Generative ConvNet: Training Energy-Based Models from a Single Natural Image for Internal Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[21]  David J. Fleet,et al.  Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Michal Irani,et al.  Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[24]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[25]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[26]  L. Wolf,et al.  Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample , 2020, NeurIPS.

[27]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[28]  Matthew Fisher,et al.  Improved Techniques for Training Single-Image GANs , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29]  Michal Irani,et al.  InGAN: Capturing and Retargeting the “DNA” of a Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Peyman Milanfar,et al.  NIMA: Neural Image Assessment , 2017, IEEE Transactions on Image Processing.

[32]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[33]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.