Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors

Rate-distortion (RD)-based mode selections are important techniques in video coding. In these methods, an encoder may compute the RD costs for all the possible coding modes, and select the one which achieves the best trade-off between encoding rate and compression distortion. Previous papers have demonstrated that RD-based mode selections can lead to significant improvements in coding efficiency. RD-based mode selections, however, would incur considerable increases in encoding complexity, since these methods require computing the RD costs for numerous candidate coding modes. In this paper, we consider the scenario where software-based video encoding is performed on personal computers or game consoles, and investigate how multicore graphics processing units (GPUs) may be efficiently utilized to undertake the task of RD optimized intra-prediction mode selections in audio and video coding standards and H.264 video encoding. Achieving efficient GPU-based intra-mode decisions, however, could be nontrivial for two reasons. First, intra-mode decision tends to be sequential. Specifically, the mode decision of the current block would depend on the reconstructed data of the neighboring blocks. Therefore, the coding modes of neighboring blocks would need to be computed first before that of the current block can be determined. This dependency poses challenges to GPU-based computation, which relies heavily on parallel data processing to achieve superior speedups. Second, RD-based intra-mode decision may require conditional branchings to determine the encoding bit-rate, and these branching operations may incur substantial performance penalties when being executed on GPUs due to pipeline architectural designs. To address these issues, we analyze the data dependency in intra-mode decision, and propose novel greedy-based encoding orders to achieve highly parallel processing of data blocks. We also prove that the proposed greedy-based orders are optimal in our problem, i.e., they require the minimum number of iterations to process a video frame given the dependency constraints. In addition, we propose a method to estimate the coding rate suitable for GPU implementation. Experimental results suggest our proposed solution can be more than 50 times faster than the previously proposed parallel intra-prediction, since our work can efficiently exploit the massive parallel opportunity in GPUs.

[1]  유기원,et al.  Intra prediction method and apparatus thereof , 2003 .

[2]  Lai-Man Po,et al.  A fast H.264 intra prediction algorithm using macroblock properties , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[3]  Jaeseok Kim,et al.  Pipelined Intra Prediction Using Shuffled Encoding Order for H.264/AVC , 2006, TENCON 2006 - 2006 IEEE Region 10 Conference.

[4]  Oscar C. Au,et al.  Motion Estimation for H.264/AVC using Programmable Graphics Hardware , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[5]  Antonio Ortega,et al.  Rate-distortion methods for image and video compression , 1998, IEEE Signal Process. Mag..

[6]  Guizhong Liu,et al.  Fast Mode Decision Algorithm for Intra Prediction in H.264/AVC , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Gao Wen Digital Audio Video Coding Standard of AVS , 2006 .

[8]  Oscar C. Au,et al.  Intra Frame Encoding Using Programmable Graphics Hardware , 2007, PCM.

[9]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .

[10]  Zhi-Qiang Wei,et al.  A fast mode decision algorithm for intra prediction in AVS-M video coding , 2007, 2007 International Conference on Wavelet Analysis and Pattern Recognition.

[11]  Guizhong Liu,et al.  Fast mode decision algorithm for intra prediction in H.264/AVC with integer transform and adaptive threshold , 2007, Signal Image Video Process..

[12]  Feng Wu,et al.  Overview of AVS video standard , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[13]  Tomas Akenine-Möller,et al.  High dynamic range texture compression for graphics hardware , 2006, SIGGRAPH 2006.

[14]  Gary J. Sullivan,et al.  Rate-constrained coder control and comparison of video coding standards , 2003, IEEE Trans. Circuits Syst. Video Technol..

[15]  Tao Wang,et al.  Novel parallel Hough Transform on multi-core processors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  J. Krüger,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, ACM Trans. Graph..

[17]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[18]  Hyuk-Jae Lee,et al.  A Parallel and Pipelined Execution of H.264/AVC Intra Prediction , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[19]  Martin Cadík,et al.  FFT and Convolution Performance in Image Filtering on GPU , 2006, Tenth International Conference on Information Visualisation (IV'06).

[20]  Feng Yi,et al.  Overview of AVS-video: tools, performance and complexity , 2005, Visual Communications and Image Processing.

[21]  Jiying Zhao,et al.  Real-time video watermarking on programmable graphics hardware , 2005, Canadian Conference on Electrical and Computer Engineering, 2005..

[22]  Yu-Cheng Lin,et al.  Multi-pass algorithm of motion estimation in video encoding for generic GPU , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[23]  R. L. Baker,et al.  Rate-distortion optimized motion compensation for video compression using fixed or variable size blocks , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[24]  Gary J. Sullivan,et al.  Rate-distortion optimization for video compression , 1998, IEEE Signal Process. Mag..

[25]  John D. Villasenor,et al.  Trellis-based R-D optimal quantization in H.263+ , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[26]  Lai-Man Po,et al.  Fast Bit Rate Estimation for Mode Decision of H.264/AVC , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Michael R. Macedonia,et al.  The GPU Enters Computing's Mainstream , 2003, Computer.

[28]  Antonio Ortega,et al.  Forward-adaptive quantization with optimal overhead cost for image and video coding with applications to MPEG video coders , 1995, Electronic Imaging.

[29]  Oscar C. Au,et al.  Parallel rate-distortion optimized intra mode decision on multi-core graphics processors using greedy-based encoding orders , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[30]  Harry Shum,et al.  Accelerate Video Decoding With Generic GPU , 2005, IEEE Trans. Circuits Syst. Video Technol..