Estimating the Resize Parameter in End-to-end Learned Image Compression

—We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models. Our approach is simple: compose a pair of differentiable downsampling/upsampling layers that sandwich a neural compression model. To determine resize factors for different inputs, we utilize another neural network jointly trained with the compression model, with the end goal of minimizing the rate-distortion objective. Our results suggest that “compression friendly” downsampled representations can be quickly determined during encoding by using an auxiliary network and differentiable image warping. By conducting extensive experimental tests on existing deep image compression models, we show results that our new resizing parameter estimation frame- work can provide Bjøntegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines. We also carried out a subjective quality study, the results of which show that our new approach yields favorable compressed images. To facilitate reproducible research in this direction, the implementation used in this paper is being made freely available online at: https://github.com/treammm/ResizeCompression.

[1]  Ioannis Katsavounidis,et al.  Fast encoding parameter selection for convex hull video encoding , 2020, Optical Engineering + Applications.

[2]  Siwei Ma,et al.  An Enhanced Reference Structure For Reference Picture Resampling (RPR) In VVC , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[3]  Patrick Le Callet,et al.  A Case Study of Machine Learning Classifiers for Real-Time Adaptive Resolution Prediction in Video Coding , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Takeru Miyato,et al.  Neural Multi-scale Image Compression , 2018, ACCV.

[5]  J. Astola,et al.  ON BETWEEN-COEFFICIENT CONTRAST MASKING OF DCT BASIS FUNCTIONS , 2007 .

[6]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[8]  Alan C. Bovik,et al.  Speeding Up VP9 Intra Encoder With Hierarchical Deep Learning-Based Partition Prediction , 2019, IEEE Transactions on Image Processing.

[9]  Houqiang Li,et al.  End-to-End Optimized Versatile Image Compression With Wavelet-Like Transform , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  D. Bull,et al.  ViSTRA2: Video Coding using Spatial Resolution and Effective Bit Depth Adaptation , 2019, Signal processing. Image communication.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Zhou Wang,et al.  Multi-scale structural similarity for image quality assessment , 2003 .

[15]  Jooyoung Lee,et al.  Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[16]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[17]  Masaru Takeuchi,et al.  Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alan C. Bovik,et al.  Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression , 2021, ArXiv.

[19]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[21]  Anil C. Kokaram,et al.  Optimized Transcoding for Large Scale Adaptive Streaming Using Playback Statistics , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[22]  Yu-Chen Sun,et al.  Adaptive Resolution Change for Versatile Video Coding , 2020, 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP).

[23]  Gary J. Sullivan,et al.  Overview of the Versatile Video Coding (VVC) Standard and its Applications , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[25]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[26]  Alan C. Bovik,et al.  Image information and visual quality , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Jiro Katto,et al.  Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[29]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  M. Angela Sasse,et al.  Can small be beautiful?: assessing image resolution requirements for mobile TV , 2005, MULTIMEDIA '05.

[31]  Anjul Patney,et al.  MOVI-Codec: Deep Video Compression without Motion , 2021, 2021 Picture Coding Symposium (PCS).

[32]  Christopher Edwards,et al.  Adaptive Bitrate Selection: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[33]  Debargha Mukherjee,et al.  An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media , 2020, APSIPA Transactions on Signal and Information Processing.

[34]  Mariana Afonso,et al.  Video Compression Based on Spatio-Temporal Resolution Adaptation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Yung-Yu Chuang,et al.  Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[36]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Debargha Mukherjee,et al.  In-loop Frame Super-resolution in AV1 , 2019, 2019 Picture Coding Symposium (PCS).

[38]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[39]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[40]  Miroslav Uhrina,et al.  Chroma subsampling influence on the perceived video quality for compressed sequences in high resolutions , 2017 .

[41]  Tim Fingscheidt,et al.  GAN- vs. JPEG2000 Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation , 2019, ArXiv.

[42]  Eirikur Agustsson,et al.  High-Fidelity Generative Image Compression , 2020, NeurIPS.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Ci Wang,et al.  Down-Sampling Based Video Coding Using Super-Resolution Technique , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Salama S. Ikki,et al.  Overview of Precoding Techniques for Massive MIMO , 2021, IEEE Access.

[46]  Gregory W. Cermak,et al.  The Relationship Among Video Quality, Screen Resolution, and Bit Rate , 2011, IEEE Transactions on Broadcasting.

[47]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Murat Kunt,et al.  Vision and Video: Models and Applications , 2001 .

[49]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Pascal Frossard,et al.  Complexity constrained representation selection for dynamic adaptive streaming , 2016, 2016 Visual Communications and Image Processing (VCIP).

[51]  K. Mullen The contrast sensitivity of human colour vision to red‐green and blue‐yellow chromatic gratings. , 1985, The Journal of physiology.

[52]  Yao Wang,et al.  Neural Video Coding Using Multiscale Motion Compensation and Spatiotemporal Context Model , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[53]  Touradj Ebrahimi,et al.  Performance Evaluation of Objective Image Quality Metrics on Conventional and Learning-Based Compression Artifacts , 2021, 2021 13th International Conference on Quality of Multimedia Experience (QoMEX).

[54]  Xiangjun Zhang,et al.  Low Bit-Rate Image Compression via Adaptive Down-Sampling and Constrained Least Squares Upconversion , 2009, IEEE Transactions on Image Processing.

[55]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Li Dong,et al.  Adaptive downsampling to improve image compression at low bit rates , 2006, IEEE Transactions on Image Processing.

[58]  Hongyu Li,et al.  VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment , 2014, IEEE Transactions on Image Processing.

[59]  D. Mukherjee,et al.  A Technical Overview of AV1 , 2020, Proceedings of the IEEE.

[60]  Alberto Blanc,et al.  Optimal Selection of Adaptive Streaming Representations , 2014, ACM Trans. Multim. Comput. Commun. Appl..

[61]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[62]  Dionysios I. Reisis,et al.  Reduced Complexity Superresolution for Low-Bitrate Video Compression , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[63]  Michael Elad,et al.  Down-Scaling for Better Transform Compression , 2001, Scale-Space.

[64]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[65]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Nicola Asuni,et al.  TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms , 2014, STAG.

[67]  P. Milanfar,et al.  MAXIM: Multi-Axis MLP for Image Processing , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Zhan Ma,et al.  End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling , 2021, IEEE Transactions on Image Processing.

[69]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.