Multiple layer parallel motion estimation on GPU for High Efficiency Video Coding (HEVC)

This paper provides a multiple-layer parallel motion estimation (ME) scheme implemented on GPU for High Efficiency Video Coding (HEVC). The scheme is hierarchically structured, including four layers: coding tree unit (CTU), prediction unit (PU), motion vector (MV) selection and instruction optimization. In PU-layer, costs of various PU sizes were obtained through a SAD (sum of absolute differences) look-up table instead of progressive cost merging. And during MV selection, GPU's comparison instruction was used to avoid branches. At the same time, concurrent CTUs processing and SIMD (Single Instruction, Multiple Data) optimization also improve the performance significantly. Experimental results show that the proposed scheme can take full advantage of GPU and achieves over 90 times speedup compared with the HM10.0 using fast ME.

[1]  Hsueh-Ming Hang,et al.  H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA) , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[2]  Christos Grecos,et al.  A highly-parallel approach on motion estimation for high efficiency video coding (HEVC) , 2014, 2014 IEEE International Conference on Consumer Electronics (ICCE).

[3]  Satoshi Goto,et al.  OpenCL based high-quality HEVC motion estimation on GPU , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[4]  Wen Gao,et al.  Low Complexity Rate Distortion Optimization for HEVC , 2013, 2013 Data Compression Conference.

[5]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Jun Sun,et al.  Efficient SIMD optimization of HEVC encoder over X86 processors , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[7]  Seoung-Jun Oh,et al.  Variable block size motion estimation implementation on compute unified device architecture (CUDA) , 2013, 2013 IEEE International Conference on Consumer Electronics (ICCE).

[8]  Li Song,et al.  Paralleling variable block size motion estimation of HEVC on CPU plus GPU platform , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[9]  J.L. Sanchez,et al.  Accelerating H.264 inter prediction in a GPU by using CUDA , 2010, 2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE).