论文信息 - High-Fidelity Image Compression with Score-based Generative Models

High-Fidelity Image Compression with Score-based Generative Models

Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simple but theoretically motivated two-stage approach combining an autoencoder targeting MSE followed by a further score-based decoder. However, as we will show, implementation details matter and the optimal design decisions can differ greatly from typical text-to-image models.

[1] Ting Chen. On the Importance of Noise Scheduling for Diffusion Models , 2023, ArXiv.

[2] Tim Salimans,et al. simple diffusion: End-to-end diffusion for high resolution images , 2023, ICML.

[3] G. Toderici,et al. Multi-Realism Image Compression with a Conditional Generator , 2022, ArXiv.

[4] David J. Fleet,et al. Scalable Adaptive Computation for Iterative Generation , 2022, ICML.

[5] I. Laptev,et al. Image Compression with Product Quantized Masked Image Modeling , 2022, Trans. Mach. Learn. Res..

[6] David J. Fleet,et al. Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.

[7] S. Mandt,et al. Lossy Image Compression with Conditional Diffusion Models , 2022, ArXiv.

[8] Chengyue Gong,et al. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , 2022, ICLR.

[9] Xiaoming Tao,et al. Toward Semantic Communications: Deep Learning-Based Image Semantic Coding , 2022, IEEE Journal on Selected Areas in Communications.

[10] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..

[11] Tim Salimans,et al. Lossy Compression with Gaussian Diffusion , 2022, ArXiv.

[12] Hongwei Qin,et al. PO-ELIC: Perception-Oriented Efficient Learned Image Coding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[14] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[15] Hongwei Qin,et al. ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Tim Salimans,et al. Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[17] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] David J. Fleet,et al. Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.

[19] Lucas Theis,et al. Algorithms for the Communication of Samples , 2021, ICML.

[20] Ashish Khisti,et al. Universal Rate-Distortion-Perception Representations for Lossy Compression , 2021, NeurIPS.

[21] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[22] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Eirikur Agustsson,et al. On the advantages of stochastic encoders , 2021, ArXiv.

[24] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[25] Jiaming Song,et al. Denoising Diffusion Implicit Models , 2020, ICLR.

[26] David Minnen,et al. Channel-Wise Autoregressive Entropy Models for Learned Image Compression , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[27] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[28] Eirikur Agustsson,et al. High-Fidelity Generative Image Compression , 2020, NeurIPS.

[29] R. Manmatha,et al. Saliency Driven Perceptual Image Compression , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30] Haibin Shen,et al. A GAN-based Tunable Image Compression System , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[32] Robert Peharz,et al. Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters , 2018, ICLR.

[33] L. Gool,et al. Generative Adversarial Networks for Extreme Learned Image Compression , 2018, IEEE International Conference on Computer Vision.

[34] Y. Blau,et al. The Perception-Distortion Tradeoff , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[36] Lubomir D. Bourdev,et al. Real-Time Adaptive Image Compression , 2017, ICML.

[37] Nir Shavit,et al. Generative Compression , 2017, 2018 Picture Coding Symposium (PCS).

[38] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[40] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[41] Aaron C. Courville,et al. Generative Adversarial Nets , 2014, NIPS.

[42] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43] Tianlin Xu,et al. Neural Image Compression with a Diffusion-Based Decoder , 2023, ArXiv.

[44] Diederik P. Kingma,et al. On Density Estimation with Diffusion Models , 2021, NeurIPS.