Transform-domain temporal prediction in video coding: Exploiting correlation variation across coefficients

Temporal prediction in standard video coding is performed in the spatial domain, where each pixel is predicted from a motion-compensated reconstructed pixel in a prior frame. This paper is premised on the realization that such standard prediction treats each pixel independently and ignores underlying spatial correlations, while transform-domain prediction would eliminate much of the spatial correlation before signal components (transform coefficients) are independently predicted. Moreover, the true temporal correlations emerge after signal decomposition, and vary considerably from low to high frequency components. This precise nature of the temporal dependencies is entirely masked in spatial domain prediction by the high temporal correlation coefficient (ρ ≈ 1) imposed on all pixels by the dominant low frequency components. We derive optimal transform-domain per-coefficient predictors for three main settings: basic inter-frame prediction; bi-directional prediction; and enhancement-layer prediction in scalable coding. Experimental results provide evidence for substantial performance gains in all settings.

[1]  Kenneth Rose,et al.  Toward optimality in scalable predictive coding , 2001, IEEE Trans. Image Process..

[2]  Michael T. Orchard,et al.  Rate-distortion based temporal filtering for video compression , 1996, Proceedings of Data Compression Conference - DCC '96.

[3]  Heiko Schwarz,et al.  Overview of the Scalable Video Coding Extension of the H.264/AVC Standard , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Gary J. Sullivan,et al.  Rate-constrained coder control and comparison of video coding standards , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5]  Kenneth Rose,et al.  Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences , 2010, 2010 Data Compression Conference.

[6]  Michael T. Orchard,et al.  Overlapped block motion compensation: an estimation-theoretic approach , 1994, IEEE Trans. Image Process..

[7]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[8]  Heiko Schwarz,et al.  Constrained inter-layer prediction for single-loop decoding in spatial scalability , 2005, IEEE International Conference on Image Processing 2005.

[9]  Thomas Wiegand,et al.  Long-term memory motion-compensated prediction , 1999, IEEE Trans. Circuits Syst. Video Technol..

[10]  Fabio Bellifemine,et al.  Statistical analysis of the 2D-DCT coefficients of the differential signal for images , 1992, Signal Process. Image Commun..

[11]  Anthony G. Constantinides,et al.  Variable size block matching motion compensation with applications to video coding , 1990 .