End-to-end Optimized Image Compression

We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural networks, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate-distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate-distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.

[1]  P. Wintz Transform picture coding , 1972 .

[2]  Roger M. Needham,et al.  Note on evaluation , 1973, Inf. Storage Retr..

[3]  A.N. Netravali,et al.  Picture coding: A review , 1980, Proceedings of the IEEE.

[4]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[5]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[6]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[7]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[8]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[9]  Bruno A. Olshausen,et al.  Inferring Sparse, Overcomplete Image Codes Using an Efficient Coding Framework , 1998, NIPS.

[10]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[11]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[12]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[13]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[14]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[15]  E. Candès New tight frames of curvelets and optimal representations of objects with C² singularities , 2002 .

[16]  Zhou Wang,et al.  Multi-scale structural similarity for image quality assessment , 2003 .

[17]  Heiko Schwarz,et al.  Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[18]  E. Candès,et al.  New tight frames of curvelets and optimal representations of objects with piecewise C2 singularities , 2004 .

[19]  Richard Baraniuk,et al.  The Dual-tree Complex Wavelet Transform , 2007 .

[20]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[21]  Eero P. Simoncelli,et al.  Nonlinear image representation for efficient perceptual coding , 2006, IEEE Transactions on Image Processing.

[22]  M. Carandini,et al.  Functional Mechanisms Shaping Lateral Geniculate Responses to Artificial and Natural Stimuli , 2008, Neuron.

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Siwei Lyu Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform , 2010, NIPS.

[26]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[27]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[28]  Matthias Bethge,et al.  What Is the Limit of Redundancy Reduction with Divisive Normalization? , 2013, Neural Computation.

[29]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[30]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[31]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[32]  Jasper Snoek,et al.  Spectral Representations for Convolutional Neural Networks , 2015, NIPS.

[33]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[36]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[37]  Valero Laparra,et al.  End-to-end optimization of nonlinear transform codes for perceptual quality , 2016, 2016 Picture Coding Symposium (PCS).

[38]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[39]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[40]  Valero Laparra,et al.  Density Modeling of Images using a Generalized Normalization Transformation , 2015, ICLR.

[41]  Valero Laparra,et al.  Perceptual image quality assessment using a normalized Laplacian pyramid , 2016, HVEI.

[42]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).