Paralleling variable block size motion estimation of HEVC on CPU plus GPU platform

The emerging HEVC standard supports up to 12 variable block sizes ranging from 4×8/8×4 to 64×64 to conduct motion estimation (ME) and motion compensation (MC). This feature contributes considerable coding gain compared with 7 variable block sizes in H.264/AVC at the cost of huge computational complexity. In the test model HM, ME with variable block sizes (VBSME) may be called up to 425 times for the mode decision procedure of one CTU (Coding Tree Unit). Obviously, VBSME becomes the bottleneck for real time encoding. In this paper, we focus on parallel realization architecture design of VBSME in HEVC. Firstly, an efficient parallel encoder framework is proposed for CPU plus GPU platform. With the framework, VBSME, fractional-pixel image interpolation and border padding processes run on GPU without burden on the host CPU. Secondly, for workload balance between CPU and GPU, a fast Prediction Unit partition mode decision algorithm is also proposed. Lastly, the parallel realization strategy of VBSME on GPU is improved for ME compression performance improvement. Experimental results based on the NVIDIA's C2050 GPU show that the speed of the VBSME strategy on GPU is about 113 times faster than the one on CPU.

[1]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[2]  Oscar C. Au,et al.  Motion Estimation for H.264/AVC using Programmable Graphics Hardware , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[3]  Yuan Zhang,et al.  Fast mode decision for H.264 video coding in packet loss environment , 2011, 2011 18th IEEE International Conference on Image Processing.

[4]  Renjie Li,et al.  Fast Mode Decision for H.264 Video Encoder Based on MB Motion Characteristic , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[5]  Peter Lambert,et al.  Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA , 2009, Optical Engineering + Applications.

[6]  Hsueh-Ming Hang,et al.  H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA) , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[7]  Yongdong Zhang,et al.  High Efficiency Video Coding: High Efficiency Video Coding , 2014 .