Saliency Driven Perceptual Image Compression

This paper proposes a new end-to-end trainable model for lossy image compression, which includes several novel components. The method incorporates 1) an adequate perceptual similarity metric; 2) saliency in the images; 3) a hierarchical auto-regressive model. This paper demonstrates that the popularly used evaluations metrics such as MS-SSIM and PSNR are inadequate for judging the performance of image compression techniques as they do not align with the human perception of similarity. Alternatively, a new metric is proposed, which is learned on perceptual similarity data specific to image compression. The proposed compression model incorporates the salient regions and optimizes on the proposed perceptual similarity metric. The model not only generates images which are visually better but also gives superior performance for subsequent computer vision tasks such as object detection and segmentation when compared to existing engineered or learned compression techniques.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[4]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jooyoung Lee,et al.  Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[6]  Martial Hebert,et al.  Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Thomas Boutell,et al.  PNG (Portable Network Graphics) Specification Version 1.0 , 1997, RFC.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  J. Jiang,et al.  Image compression with neural networks - A survey , 1999, Signal Process. Image Commun..

[14]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[15]  Yochai Blau,et al.  The Perception-Distortion Tradeoff , 2017, CVPR.

[16]  Christoph H. Lampert,et al.  PixelCNN Models with Auxiliary Variables for Natural Image Modeling , 2017, ICML.

[17]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[18]  Garrison W. Cottrell,et al.  Image compression by back-propagation: An example of extensional programming , 1988 .

[19]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[21]  Yash Patel,et al.  Human Perceptual Evaluations for Image Compression , 2019, ArXiv.

[22]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Heiko Schwarz,et al.  Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  R. Manmatha,et al.  Deep Perceptual Compression , 2019, ArXiv.

[26]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[27]  Radu Timofte,et al.  2018 PIRM Challenge on Perceptual Image Super-resolution , 2018, ArXiv.

[28]  Matthias Bethge,et al.  Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.

[29]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[31]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[32]  Sangeeta Mishra,et al.  Image Compression Using Neural Network , 2012 .

[33]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[34]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[35]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[36]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[37]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.