High-Fidelity Image Compression with Score-based Generative Models

Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simple but theoretically motivated two-stage approach combining an autoencoder targeting MSE followed by a further score-based decoder. However, as we will show, implementation details matter and the optimal design decisions can differ greatly from typical text-to-image models.

[1]  Ting Chen On the Importance of Noise Scheduling for Diffusion Models , 2023, ArXiv.

[2]  Tim Salimans,et al.  simple diffusion: End-to-end diffusion for high resolution images , 2023, ICML.

[3]  G. Toderici,et al.  Multi-Realism Image Compression with a Conditional Generator , 2022, ArXiv.

[4]  David J. Fleet,et al.  Scalable Adaptive Computation for Iterative Generation , 2022, ICML.

[5]  I. Laptev,et al.  Image Compression with Product Quantized Masked Image Modeling , 2022, Trans. Mach. Learn. Res..

[6]  David J. Fleet,et al.  Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.

[7]  S. Mandt,et al.  Lossy Image Compression with Conditional Diffusion Models , 2022, ArXiv.

[8]  Chengyue Gong,et al.  Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , 2022, ICLR.

[9]  Xiaoming Tao,et al.  Toward Semantic Communications: Deep Learning-Based Image Semantic Coding , 2022, IEEE Journal on Selected Areas in Communications.

[10]  Jing Yu Koh,et al.  Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..

[11]  Tim Salimans,et al.  Lossy Compression with Gaussian Diffusion , 2022, ArXiv.

[12]  Hongwei Qin,et al.  PO-ELIC: Perception-Oriented Efficient Learned Image Coding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[14]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[15]  Hongwei Qin,et al.  ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tim Salimans,et al.  Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[17]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  David J. Fleet,et al.  Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.

[19]  Lucas Theis,et al.  Algorithms for the Communication of Samples , 2021, ICML.

[20]  Ashish Khisti,et al.  Universal Rate-Distortion-Perception Representations for Lossy Compression , 2021, NeurIPS.

[21]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[22]  David J. Fleet,et al.  Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Eirikur Agustsson,et al.  On the advantages of stochastic encoders , 2021, ArXiv.

[24]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[25]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[26]  David Minnen,et al.  Channel-Wise Autoregressive Entropy Models for Learned Image Compression , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[27]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[28]  Eirikur Agustsson,et al.  High-Fidelity Generative Image Compression , 2020, NeurIPS.

[29]  R. Manmatha,et al.  Saliency Driven Perceptual Image Compression , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Haibin Shen,et al.  A GAN-based Tunable Image Compression System , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[32]  Robert Peharz,et al.  Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters , 2018, ICLR.

[33]  L. Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, IEEE International Conference on Computer Vision.

[34]  Y. Blau,et al.  The Perception-Distortion Tradeoff , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[36]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[37]  Nir Shavit,et al.  Generative Compression , 2017, 2018 Picture Coding Symposium (PCS).

[38]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[40]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[41]  Aaron C. Courville,et al.  Generative Adversarial Nets , 2014, NIPS.

[42]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43]  Tianlin Xu,et al.  Neural Image Compression with a Diffusion-Based Decoder , 2023, ArXiv.

[44]  Diederik P. Kingma,et al.  On Density Estimation with Diffusion Models , 2021, NeurIPS.