论文信息 - Palette: Image-to-Image Diffusion Models

Palette: Image-to-Image Diffusion Models

We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. On four challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG decompression), Palette outperforms strong GAN and regression baselines, and establishes a new state of the art. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss, demonstrating a desirable degree of generality and flexibility. We uncover the impact of using L2 vs. L1 loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, and report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images for various baselines. We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research. Finally, we show that a single generalist Palette model trained on 3 tasks (colorization, inpainting, JPEG decompression) performs as well or better than task-specific specialist counterparts. Check out https://bit.ly/palette-diffusion for more details. Colorization Inpainting Uncropping JPEG decompression

[1] Jiaya Jia,et al. Wide-Context Semantic Image Extrapolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Faisal Z. Qureshi,et al. EdgeConnect: Structure Guided Image Inpainting using Edge Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[3] Jianfei Cai,et al. Pluralistic Image Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Ramin Zabih,et al. OCONet: Image Extrapolation by Object Completion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jonathan Ho,et al. Structured Denoising Diffusion Models in Discrete State-Spaces , 2021, ArXiv.

[6] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[7] Aditya Deshpande,et al. Learning Diverse Image Colorization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Shuicheng Yan,et al. Very Long Natural Scenery Image Prediction by Outpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Juan Lin,et al. Trinity of Pixel Enhancement: a Joint Solution for Demosaicking, Denoising and Super-Resolution , 2019, ArXiv.

[10] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[11] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[12] Mohammad Norouzi,et al. Pixel Recursive Super Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[14] Thomas S. Huang,et al. Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Steven M. Drucker,et al. Quality prediction for image completion , 2012, ACM Trans. Graph..

[16] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ullrich Köthe,et al. Guided Image Generation with Conditional Invertible Neural Networks , 2019, ArXiv.

[18] Jan Kautz,et al. Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[19] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[20] Michael J. Black,et al. Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21] David J. Fleet,et al. Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..

[22] Stefano Ermon,et al. D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation , 2021, NeurIPS.

[23] Jaakko Lehtinen,et al. Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Alberto Del Bimbo,et al. Deep Universal Generative Adversarial Compression Artifact Removal , 2019, IEEE Transactions on Multimedia.

[25] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[26] Didrik Nielsen,et al. Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models , 2021, ArXiv.

[27] Adam Finkelstein,et al. PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[28] Ser-Nam Lim,et al. Quantization Guided JPEG Artifact Correction , 2020, ECCV.

[29] Alexei A. Efros,et al. Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[30] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Gregory Shakhnarovich,et al. Learning Representations for Automatic Colorization , 2016, ECCV.

[33] Cynthia Rudin,et al. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Noah Snavely,et al. Learning Gradient Fields for Shape Generation , 2020, ECCV.

[35] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[36] Toby P. Breckon,et al. UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models , 2021, ArXiv.

[37] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[38] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[39] Liang Lin,et al. Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] William T. Freeman,et al. Boundless: Generative Adversarial Networks for Image Extension , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Jitendra Malik,et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[44] Liang Lin,et al. Multi-level Wavelet-CNN for Image Restoration , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45] Alberto Del Bimbo,et al. Deep Generative Adversarial Compression Artifact Removal , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.

[47] Zahra Kadkhodaie,et al. Solving Linear Inverse Problems Using the Prior Implicit in a Denoiser , 2020, ArXiv.

[48] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[49] Suman V. Ravuri,et al. Classification Accuracy Score for Conditional Generative Models , 2019, NeurIPS.

[50] Jonathon Shlens,et al. Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[52] Xiaoou Tang,et al. Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53] Heiga Zen,et al. WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis , 2021, Interspeech.

[54] Thomas S. Huang,et al. Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[56] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Guillermo Sapiro,et al. Image inpainting , 2000, SIGGRAPH.

[58] Hiroshi Ishikawa,et al. Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[59] Eirikur Agustsson,et al. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60] Shengyu Zhao,et al. Large Scale Image Completion via Co-Modulated Generative Adversarial Networks , 2021, ICLR.

[61] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[62] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[63] Jan Kautz,et al. NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[64] Nal Kalchbrenner,et al. Colorization Transformer , 2021, ICLR.

[65] Wei Huang,et al. Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations , 2020, ECCV.

[66] Lior Wolf,et al. Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[67] Zhan Xu,et al. Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Kyoung Mu Lee,et al. Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Ralph R. Martin,et al. BiggerPicture: data-driven image extrapolation using graph matching , 2014, ACM Trans. Graph..

[70] Jian Sun,et al. Statistics of Patch Offsets for Image Completion , 2012, ECCV.