SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

We introduce a new image editing and synthesis framework, Stochastic Differential Editing (SDEdit), based on a recent generative model using stochastic differential equations (SDEs). Given an input image with user edits (e.g., hand-drawn color strokes), we first add noise to the input according to an SDE, and subsequently denoise it by simulating the reverse SDE to gradually increase its likelihood under the prior. Our method does not require task-specific loss function designs, which are critical components for recent image editing methods based on GAN inversion. Compared to conditional GANs, we do not need to collect new datasets of original and edited images for new applications. Therefore, our method can quickly adapt to various editing tasks at test time without re-training models. Our approach achieves strong performance on a wide range of applications, including image synthesis and editing guided by stroke paintings and image compositing.

[1]  Alexei A. Efros,et al.  Real-time user-guided image colorization with learned deep priors , 2017, ACM Trans. Graph..

[2]  Daniel Cohen-Or,et al.  Designing an encoder for StyleGAN image manipulation , 2021, ACM Trans. Graph..

[3]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Chuang Gan,et al.  Sparse, Smart Contours to Represent and Edit Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Tim Sauer,et al.  Numerical Solution of Stochastic Differential Equations in Finance , 2012 .

[6]  Peter Wonka,et al.  SEAN: Image Synthesis With Semantic Region-Adaptive Normalization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Chen Change Loy,et al.  Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation , 2020, ECCV.

[8]  Stefano Ermon,et al.  Permutation Invariant Graph Generation via Score-Based Generative Modeling , 2020, AISTATS.

[9]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Dongdong Chen,et al.  Deep exemplar-based colorization , 2018, ACM Trans. Graph..

[11]  Bolei Zhou,et al.  Image Processing Using Multi-Code GAN Prior , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ioannis Mitliagkas,et al.  Adversarial score matching and improved sampling for image generation , 2020, ICLR.

[13]  David Bau,et al.  Rewriting a Deep Generative Model , 2020, ECCV.

[14]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[15]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[16]  Eero P. Simoncelli,et al.  Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[17]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[18]  Fisher Yu,et al.  TextureGAN: Controlling Deep Image Synthesis with Texture Patches , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Eero P. Simoncelli,et al.  Learning to be Bayesian without Supervision , 2006, NIPS.

[20]  Ying Nian Wu,et al.  Learning Energy-Based Models as Generative ConvNets via Multi-grid Modeling and Sampling. , 2020 .

[21]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[22]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2021, ICLR.

[25]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[26]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bolei Zhou,et al.  Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[29]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[30]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[31]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[32]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[33]  Matthias Zwicker,et al.  Faceshop , 2018, ACM Trans. Graph..

[34]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[35]  Diederik P. Kingma,et al.  How to Train Your Energy-Based Models , 2021, ArXiv.

[36]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bolei Zhou,et al.  Seeing What a GAN Cannot Generate , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[39]  Daniel Cohen-Or,et al.  Pivotal Tuning for Latent-based Editing of Real Images , 2021, ACM Trans. Graph..

[40]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[41]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Rushil Anirudh,et al.  MimicGAN: Robust Projection onto Image Manifolds with Corruption Mimicking , 2020, International Journal of Computer Vision.

[43]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[44]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[45]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[46]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[47]  Jaakko Lehtinen,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[48]  Minyoung Huh,et al.  Transforming and Projecting Images into Class-conditional Generative Networks , 2020, ECCV.

[49]  Yang Lu,et al.  Cooperative Learning of Energy-Based Model and Latent Variable Model via MCMC Teaching , 2018, AAAI.

[50]  Fisher Yu,et al.  Scribbler: Controlling Deep Image Synthesis with Sketch and Color , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[52]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Aude Oliva,et al.  GANalyze: Toward Visual Definitions of Cognitive Image Properties , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[56]  Bolei Zhou,et al.  Generative Hierarchical Features from Synthesizing Images , 2020, ArXiv.

[57]  Wei Ping,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[58]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[59]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[60]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[61]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[62]  Daniel Cohen-Or,et al.  Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Christian Wolf,et al.  Interactive example-based terrain authoring with conditional generative adversarial networks , 2017, ACM Trans. Graph..

[64]  Dani Lischinski,et al.  StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[66]  B. Anderson Reverse-time diffusion equation models , 1982 .

[67]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[68]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Deli Zhao,et al.  In-Domain GAN Inversion for Real Image Editing , 2020, ECCV.

[71]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[72]  Dani Lischinski,et al.  Non-stationary texture synthesis by adversarial expansion , 2018, ACM Trans. Graph..