End-to-end optimization of nonlinear transform codes for perceptual quality

We introduce a general framework for end-to-end optimization of the rate-distortion performance of nonlinear transform codes assuming scalar quantization. The framework can be used to optimize any differentiable pair of analysis and synthesis transforms in combination with any differentiable perceptual metric. As an example, we consider a code built from a linear transform followed by a form of multi-dimensional local gain control. Distortion is measured with a state-of-the-art perceptual metric. When optimized over a large database of images, this representation offers substantial improvements in bitrate and perceptual appearance over fixed (DCT) codes, and over linear transform codes optimized for mean squared error.

[1]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[2]  Bernd Girod,et al.  What's wrong with mean-squared error? , 1993 .

[3]  Heidi A. Peterson,et al.  Luminance-model-based DCT quantization for color image compression , 1992, Electronic Imaging.

[4]  Herbert Gish,et al.  Asymptotically efficient quantizing , 1968, IEEE Trans. Inf. Theory.

[5]  Valero Laparra,et al.  Density Modeling of Images using a Generalized Normalization Transformation , 2015, ICLR.

[6]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[7]  Valero Laparra,et al.  Perceptual image quality assessment using a normalized Laplacian pyramid , 2016, HVEI.

[8]  Eero P. Simoncelli,et al.  Nonlinear image representation for efficient perceptual coding , 2006, IEEE Transactions on Image Processing.

[9]  Zhou Wang,et al.  Multi-scale structural similarity for image quality assessment , 2003 .

[10]  Andrew B. Watson,et al.  DCTune: A TECHNIQUE FOR VISUAL OPTIMIZATION OF DCT QUANTIZATION MATRICES FOR INDIVIDUAL IMAGES. , 1993 .

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Wen Gao,et al.  Perceptual Video Coding Based on SSIM-Inspired Divisive Normalization , 2013, IEEE Transactions on Image Processing.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Vivek K. Goyal,et al.  Theoretical foundations of transform coding , 2001, IEEE Signal Process. Mag..

[15]  L. Schuchman Dither Signals and Their Effect on Quantization Noise , 1964 .

[16]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Nikolay N. Ponomarenko,et al.  TID2008 – A database for evaluation of full-reference visual quality assessment metrics , 2004 .

[19]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[20]  Lawrence G. Roberts,et al.  Picture coding using pseudo-random noise , 1962, IRE Trans. Inf. Theory.

[21]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .