Parallel rate-distortion optimized intra mode decision on multi-core graphics processors using greedy-based encoding orders

Rate-distortion (RD) optimized intra-prediction mode selection can lead to significant improvement in coding efficiency in intra-frame encoding. However, it would incur considerable increase in encoding complexity. In this paper, we investigate how multi-core Graphics Processing Units (GPUs) can be efficiently utilized to undertake the task of RD optimized intra mode selection in AVS and H.264 video encoding. Achieving efficient GPU-based intra mode decision, however, could be non-trivial. It is because the mode decision of the current block would depend on the reconstructed data of the neighboring blocks. Therefore, the coding modes of neighboring blocks would need to be computed first before that of the current block can be determined. This dependency poses challenge to computation on multi-core GPUs, which rely heavily on parallel data processing to achieve superior speedups. To address this issue, we analyze the data dependency in intra mode decision, and propose novel greedy-based encoding orders to achieve highly parallel processing. We also prove that the proposed greedy-based orders are optimal in terms of execution time. Experimental results suggest that the proposed GPU-based intra mode decision compares favorably to the counterpart implemented on a single-core CPU.

[1]  Oscar C. Au,et al.  Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  유기원,et al.  Intra prediction method and apparatus thereof , 2003 .

[3]  Harry Shum,et al.  Accelerate Video Decoding With Generic GPU , 2005, IEEE Trans. Circuits Syst. Video Technol..

[4]  M.C. Kung,et al.  Block based parallel motion estimation using programmable graphics hardware , 2008, 2008 International Conference on Audio, Language and Image Processing.

[5]  Lai-Man Po,et al.  A fast H.264 intra prediction algorithm using macroblock properties , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[6]  Jaeseok Kim,et al.  Pipelined Intra Prediction Using Shuffled Encoding Order for H.264/AVC , 2006, TENCON 2006 - 2006 IEEE Region 10 Conference.

[7]  Hyuk-Jae Lee,et al.  A Parallel and Pipelined Execution of H.264/AVC Intra Prediction , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[8]  Tao Wang,et al.  Novel parallel Hough Transform on multi-core processors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Feng Yi,et al.  Overview of AVS-video: tools, performance and complexity , 2005, Visual Communications and Image Processing.

[10]  Leah Epstein List Scheduling , 2008, Encyclopedia of Algorithms.

[11]  Oscar C. Au,et al.  Intra Frame Encoding Using Programmable Graphics Hardware , 2007, PCM.

[12]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .