Flexible CTU-level parallel motion estimation by CPU and GPU pipeline for HEVC

In the high efficiency video coding (HEVC) encoder, motion estimation (ME) takes up more than 50% encoding time. To reduce the complexity of the ME module in HEVC, this paper proposes a flexible coding tree unit (CTU)-level parallel ME method through CPU and GPU pipeline collaboration. Firstly a highly scalable CTU-level parallel motion search scheme on GPU is provided, in which, the parallel CTU group can be configured to be any size to adapt to the variable sequence resolution and hardware configurations. Then, the motion search range can be adaptively adjusted based on the motion intensity. Therefore, the unnecessary GPU time wasting can be further avoided for slow-moving scenes, while high performance kept for fast-moving scenes. Moreover, the ME information returned from GPU can be used by CPU for fast mode decision. Experimental results show that the proposed method achieves up to 73% complexity reduction than HM10.0 anchor using CPU only with acceptable coding performance loss, providing higher performance than the state-of-the-art scheme.

[1]  Hsueh-Ming Hang,et al.  H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA) , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[2]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[3]  Li Song,et al.  Paralleling variable block size motion estimation of HEVC on CPU plus GPU platform , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[4]  Bernd Freisleben,et al.  Fast Motion Estimation on Graphics Hardware for H.264 Video Encoding , 2009, IEEE Transactions on Multimedia.