Joint Separable and Non-Separable Transforms for Next-Generation Video Coding

Throughout the past few decades, the separable discrete cosine transform (DCT), particularly the DCT type II, has been widely used in image and video compression. It is well-known that, under first-order stationary Markov conditions, DCT is an efficient approximation of the optimal Karhunen–Loève transform. However, for natural image and video sources, the adaptivity of a single separable transform with fixed core is rather limited for the highly dynamic image statistics, e.g., textures and arbitrarily directed edges. It is also known that non-separable transforms can achieve better compression efficiency for images with directional texture patterns, yet they are computationally complex, especially when the transform size is large. In order to achieve higher transform coding gains with relatively low-complexity implementations, we propose a joint separable and non-separable transform. The proposed separable primary transform, named enhanced multiple transform (EMT), applies multiple transform cores from a pre-defined subset of sinusoidal transforms, and the transform selection is signaled in a joint block level manner. Moreover, a non-separable secondary transform (NSST) method is proposed to operate in conjunction with EMT. Unlike the existing non-separable transform schemes which require excessive amounts of memory and computation, the proposed NSST efficiently improves coding gain with much lower complexity. Extensive experimental results show that the proposed methods, in a state-of-the-art video codec, such as high efficiency video coding, can provide significant coding gains (average 6.9% and 4.5% bitrate reductions for intra and random-access coding, respectively).

[1]  Jianle Chen,et al.  NSST: Non-separable secondary transforms for next generation video coding , 2016, 2016 Picture Coding Symposium (PCS).

[2]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[3]  R. Clarke,et al.  Relation between the Karhunen Loève and cosine transforms , 1981 .

[4]  Marta Karczewicz,et al.  Improved h.264 intra coding based on bi-directional intra prediction, directional transform, and adaptive coefficient scanning , 2008, 2008 15th IEEE International Conference on Image Processing.

[5]  Wen Gao,et al.  Rate-distortion optimized transform for intra-frame coding , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Seishi Takamura,et al.  On intra coding using mode dependent 2D-KLT , 2013, 2013 Picture Coding Symposium (PCS).

[8]  Wen Gao,et al.  Video Coding With Rate-Distortion Optimized Transform , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  E. Candès,et al.  Ridgelets: a key to higher-dimensional intermittency? , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[10]  Kari Karhunen,et al.  Über lineare Methoden in der Wahrscheinlichkeitsrechnung , 1947 .

[11]  Olivier Déforges,et al.  Non-separable mode dependent transforms for intra coding in HEVC , 2014, 2014 IEEE Visual Communications and Image Processing Conference.

[12]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Stephen A. Martucci,et al.  Symmetric convolution and the discrete sine and cosine transforms , 1993, IEEE Trans. Signal Process..

[14]  Wen-Chyuan Yueh EIGENVALUES OF SEVERAL TRIDIAGONAL MATRICES , 2005 .

[15]  Susanto Rahardja,et al.  Mode-Dependent Transforms for Coding Directional Intra Prediction Residuals , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Minh N. Do,et al.  Ieee Transactions on Image Processing the Contourlet Transform: an Efficient Directional Multiresolution Image Representation , 2022 .

[17]  Jianle Chen,et al.  Enhanced Multiple Transform for Video Coding , 2016, 2016 Data Compression Conference (DCC).

[18]  Guangming Shi,et al.  Exploiting Non-Local Correlation via Signal-Dependent Transform (SDT) , 2011, IEEE Journal of Selected Topics in Signal Processing.

[19]  Mark J. T. Smith,et al.  A filter bank for the directional decomposition of images: theory and design , 1992, IEEE Trans. Signal Process..

[20]  Emmanuel J. Candès,et al.  The curvelet transform for image denoising , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[21]  Bing Zeng,et al.  Directional Discrete Cosine Transforms—A New Framework for Image Coding , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Yun He,et al.  Singular vector decomposition based adaptive transform for motion compensation residuals , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[23]  Anil K. Jain,et al.  A Sinusoidal Family of Unitary Transforms , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[25]  Kenneth Rose,et al.  Jointly Optimized Spatial Prediction and Block Transform for Video and Image Coding , 2012, IEEE Transactions on Image Processing.

[26]  Debargha Mukherjee,et al.  Towards a next generation open-source video codec , 2013, Electronic Imaging.

[27]  Ankur Saxena,et al.  On secondary transforms for intra prediction residual , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .