ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression

The use of $\ell _{p}$ (p = 1,2) norms has largely dominated the measurement of loss in neural networks due to their simplicity and analytical properties. However, when used to assess the loss of visual information, these simple norms are not very consistent with human perception. Here, we describe a different “proximal” approach to optimize image analysis networks against quantitative perceptual models. Specifically, we construct a proxy network, broadly termed ProxIQA, which mimics the perceptual model while serving as a loss layer of the network. We experimentally demonstrate how this optimization framework can be applied to train an end-to-end optimized image compression network. By building on top of an existing deep image compression model, we are able to demonstrate a bitrate reduction of as much as 31% over MSE optimization, given a specified perceptual quality (VMAF) level.

[1]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jan Kautz,et al.  Loss Functions for Image Restoration With Neural Networks , 2017, IEEE Transactions on Computational Imaging.

[3]  Alan C. Bovik,et al.  Adaptive Debanding Filter , 2020, IEEE Signal Processing Letters.

[4]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[6]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[7]  Alan C. Bovik,et al.  BBAND INDEX: A NO-REFERENCE BANDING ARTIFACT PREDICTOR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Alan C. Bovik,et al.  Image information and visual quality , 2006, IEEE Trans. Image Process..

[10]  Tim Fingscheidt,et al.  On Low-Bitrate Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Heiko Schwarz,et al.  Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[13]  Yung-Yu Chuang,et al.  Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[14]  Yochai Blau,et al.  The Perception-Distortion Tradeoff , 2017, CVPR.

[15]  Tim Fingscheidt,et al.  GAN- vs. JPEG2000 Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation , 2019, ArXiv.

[16]  Sebastian Bosse,et al.  Estimation of distortion sensitivity for visual quality prediction using a convolutional neural network , 2019, Digit. Signal Process..

[17]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[18]  Zhou Wang,et al.  Multi-scale structural similarity for image quality assessment , 2003 .

[19]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kede Ma,et al.  Waterloo Exploration Database: New Challenges for Image Quality Assessment Models. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[21]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  H. Sebastian Seung,et al.  Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[23]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[24]  Alexandros G. Dimakis,et al.  Adversarial Video Compression Guided by Soft Edge Detection , 2018, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Shiqi Wang,et al.  Comparison of Image Quality Models for Optimization of Image Processing Systems , 2020, ArXiv.

[26]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[29]  Eric C. Larson,et al.  Most apparent distortion: full-reference image quality assessment and the role of strategy , 2010, J. Electronic Imaging.

[30]  Joan Bruna,et al.  Super-Resolution with Deep Convolutional Sufficient Statistics , 2015, ICLR.

[31]  Jia-Ying Lin,et al.  A 0.76 mm2 0.22 nJ/Pixel DL-Assisted 4K Video Encoder LSI for Quality-of-Experience Over Smartphones , 2018, IEEE Solid-State Circuits Letters.

[32]  Homer H. Chen,et al.  Perceptual Rate-Distortion Optimization Using Structural Similarity Index as Quality Metric , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  D. Chandler,et al.  Supplement to “ VSNR : A Visual Signal-to-Noise Ratio for Natural Images Based on Near-Threshold and Suprathreshold Vision ” , 2007 .

[34]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Alan C. Bovik,et al.  UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content , 2020, IEEE Transactions on Image Processing.

[36]  Mariusz Oszust,et al.  Decision Fusion for Image Quality Assessment using an Optimization Approach , 2016, IEEE Signal Processing Letters.

[37]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[38]  Johannes Ballé,et al.  Efficient Nonlinear Transforms for Lossy Image Compression , 2018, 2018 Picture Coding Symposium (PCS).

[39]  Debargha Mukherjee,et al.  Perceptually Inspired Weighted MSE Optimization Using Irregularity-Aware Graph Fourier Transform , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[40]  Nicola Asuni,et al.  TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms , 2014, STAG.

[41]  Bernhard Schölkopf,et al.  EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Leon A. Gatys,et al.  Controlling Perceptual Factors in Neural Style Transfer , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Zhi Li,et al.  Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale , 2020, IEEE Transactions on Image Processing.

[45]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[46]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jiro Katto,et al.  Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[49]  Nikolay N. Ponomarenko,et al.  Combining full-reference image visual quality metrics by neural network , 2015, Electronic Imaging.

[50]  Robert W. Heath,et al.  Rate Bounds on SSIM Index of Quantized Images , 2008, IEEE Transactions on Image Processing.

[51]  Weisi Lin,et al.  Image Quality Assessment Using Multi-Method Fusion , 2013, IEEE Transactions on Image Processing.

[52]  Zhou Wang,et al.  On the Mathematical Properties of the Structural Similarity Index , 2012, IEEE Transactions on Image Processing.

[53]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[54]  Wen Gao,et al.  SSIM-Motivated Rate-Distortion Optimization for Video Coding , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Alan C. Bovik,et al.  Speeding Up VP9 Intra Encoder With Hierarchical Deep Learning-Based Partition Prediction , 2019, IEEE Transactions on Image Processing.

[57]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[58]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Soo-Chang Pei,et al.  Image Quality Assessment Using Human Visual DOG Model Fused With Random Forest , 2015, IEEE Transactions on Image Processing.

[60]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[61]  Hongyu Li,et al.  VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment , 2014, IEEE Transactions on Image Processing.

[62]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[64]  Sebastian Bosse,et al.  Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment , 2016, IEEE Transactions on Image Processing.

[65]  Fei Gao,et al.  DeepSim: Deep similarity for image quality assessment , 2017, Neurocomputing.

[66]  Ke Gu,et al.  Reduced Reference Stereoscopic Image Quality Assessment Using Sparse Representation and Natural Scene Statistics , 2020, IEEE Transactions on Multimedia.

[67]  Renjie Liao,et al.  Learning to generate images with perceptual similarity metrics , 2015, 2017 IEEE International Conference on Image Processing (ICIP).