Fast motion estimation for HEVC on graphics processing unit (GPU)

Abstract The recent video compression standard, HEVC (high efficiency video coding), will most likely be used in various applications in the near future. However, the encoding process is far too slow for real-time applications. At the same time, computing capabilities of GPUs (graphics processing units) have become more powerful in these days. In this paper, we have proposed a GPU-based parallel motion estimation (ME) algorithm to enhance the performance of an HEVC encoder. A frame is partitioned into two subframes for pipelined execution to improve GPU utilization. The flow chart is redetermined to solve data hazards in the pipelined execution. Two new methods are introduced in the proposed ME: decision of a representative search center position (RSCP) and warp-based concurrent parallel reduction (WCPR). A RSCP employs motion vectors of a co-located CTU in a previously encoded frame to solve a dependency problem in parallel computation with negligible coding loss. WCPR concurrently executes several parallel reduction operations, which increases the thread utilization from 20 to 89 % without any thread synchronization. The proposed encoder can make the portion of ME in the encoder negligible with 2.2 % bitrate increase against the HEVC test model (HM) encoder. In terms of ME, the proposed ME is 130.7 times faster than that of the HM encoder.

[1]  Hsueh-Ming Hang,et al.  H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA) , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[2]  José Luis Martínez,et al.  Reducing complexity in H.264/AVC motion estimation by using a GPU , 2011, 2011 IEEE 13th International Workshop on Multimedia Signal Processing.

[3]  Oscar C. Au,et al.  Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Wesley De Neve,et al.  Parallel Deblocking Filtering in MPEG-4 AVC/H.264 on Massively Parallel Architectures , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  KoYoungsub,et al.  An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs , 2014 .

[6]  Xuan Jing,et al.  An efficient three-step search algorithm for block motion estimation , 2004, IEEE Transactions on Multimedia.

[7]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[8]  Zhou Jing,et al.  Implementation of parallel full search algorithm for motion estimation on multi-core processors , 2011, The 2nd International Conference on Next Generation Information Technology.

[9]  Kai-Kuang Ma,et al.  A new diamond search algorithm for fast block-matching motion estimation , 2000, IEEE Trans. Image Process..

[10]  Seoung-Jun Oh,et al.  Variable block size motion estimation implementation on compute unified device architecture (CUDA) , 2013, 2013 IEEE International Conference on Consumer Electronics (ICCE).

[11]  Robert M. Farber,et al.  CUDA Application Design and Development , 2011 .

[12]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[13]  Christos Grecos,et al.  Highly-parallel HVEC motion estimation with CUDA , 2013, EUVIP.

[14]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[15]  Christos Grecos,et al.  A highly-parallel approach on motion estimation for high efficiency video coding (HEVC) , 2014, 2014 IEEE International Conference on Consumer Electronics (ICCE).

[16]  Soonhoi Ha,et al.  An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs , 2012, Journal of Real-Time Image Processing.

[17]  Sergio Bampi,et al.  Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms , 2012, International Journal of Parallel Programming.

[18]  Q. Wang,et al.  Highly-parallel HVEC motion estimation with CUDA [title missing from article PDF] , 2013, European Workshop on Visual Information Processing (EUVIP).

[19]  Jack J. Purdum,et al.  C programming guide , 1983 .

[20]  Li Song,et al.  Paralleling variable block size motion estimation of HEVC on multi-core CPU plus GPU platform , 2013, 2013 IEEE International Conference on Image Processing.

[21]  Harry Shum,et al.  Accelerate Video Decoding With Generic GPU , 2005, IEEE Trans. Circuits Syst. Video Technol..