Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors

High Efficiency Video Coding (HEVC) provides superior coding efficiency than previous video coding standards at the cost of increasing encoding complexity. The complexity increase of motion estimation (ME) procedure is rather significant, especially when considering the complicated partitioning structure of HEVC. To fully exploit the coding efficiency brought by HEVC requires a huge amount of computations. In this paper, we analyze the ME structure in HEVC and propose a parallel framework to decouple ME for different partitions on many-core processors. Based on local parallel method (LPM), we first use the directed acyclic graph (DAG)-based order to parallelize coding tree units (CTUs) and adopt improved LPM (ILPM) within each CTU (DAGILPM), which exploits the CTU-level and prediction unit (PU)-level parallelism. Then, we find that there exist completely independent PUs (CIPUs) and partially independent PUs (PIPUs). When the degree of parallelism (DP) is smaller than the maximum DP of DAGILPM, we process the CIPUs and PIPUs, which further increases the DP. The data dependencies and coding efficiency stay the same as LPM. Experiments show that on a 64-core system, compared with serial execution, our proposed scheme achieves more than 30 and 40 times speedup for 1920 × 1080 and 2560 × 1600 video sequences, respectively.

[1]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[2]  Wen Gao,et al.  HEVC Lossless Coding and Improvements , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Joumana Farah,et al.  Fusion of Global and Local Motion Estimation for Distributed Video Coding , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Yongdong Zhang,et al.  Highly Parallel Framework for HEVC Motion Estimation on Many-Core Platform , 2013, 2013 Data Compression Conference.

[5]  James E. Fowler,et al.  Block-Based Compressed Sensing of Images and Video , 2012, Found. Trends Signal Process..

[6]  Siwei Ma,et al.  Parallel AMVP candidate list construction for HEVC , 2012, 2012 Visual Communications and Image Processing.

[7]  Oscar C. Au,et al.  Video Coding on Multicore Graphics Processors , 2010, IEEE Signal Processing Magazine.

[8]  Hyunggon Park,et al.  Video streaming over P2P networks: Challenges and opportunities , 2012, Signal Process. Image Commun..

[9]  Pradip Bose,et al.  A case for guarded power gating for multi-core processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[10]  Chuohao Yeo,et al.  Dynamic Range Analysis in High Efficiency Video Coding Residual Coding and Reconstruction , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Mohammed Ghanbari,et al.  Scope of validity of PSNR in image/video quality assessment , 2008 .

[12]  Jeong-Hoon Park,et al.  Block Partitioning Structure in the HEVC Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Onur Mutlu,et al.  Prefetch-aware shared-resource management for multi-core systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[14]  Yongdong Zhang,et al.  A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors , 2014, IEEE Signal Processing Letters.

[15]  Oscar C. Au,et al.  Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Thiow Keng Tan,et al.  Overview of HEVC High-Level Syntax and Reference Picture Management , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Ben H. H. Juurlink,et al.  Parallel Scalability and Efficiency of HEVC Parallelization Approaches , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Yongdong Zhang,et al.  Efficient Parallel Framework for H.264/AVC Deblocking Filter on Many-Core Platform , 2012, IEEE Transactions on Multimedia.

[20]  Erich Marth,et al.  Parallelization of the x264 encoder using OpenCL , 2010, SIGGRAPH '10.

[21]  Anantha Chandrakasan,et al.  A Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding , 2011, IEEE Journal of Solid-State Circuits.

[22]  Gary J. Sullivan,et al.  Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC) , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Bevan M. Baas,et al.  A 1080p H.264/AVC Baseline Residual Encoder for a Fine-Grained Many-Core System , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Cheng-Hsin Hsu,et al.  Using graphics rendering contexts to enhance the real-time video coding for mobile cloud gaming , 2011, ACM Multimedia.

[25]  Liang Li,et al.  Efficient parallel HEVC intra-prediction on many-core processor , 2014 .

[26]  Yongdong Zhang,et al.  Parallel deblocking filter for HEVC on many-core processor , 2014 .

[27]  Guilherme Corrêa,et al.  Performance and Computational Complexity Assessment of High-Efficiency Video Encoders , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Bu-Sung Lee,et al.  Rotated Orthogonal Transform (ROT) for Motion-Compensation Residual Coding , 2012, IEEE Transactions on Image Processing.

[29]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[30]  Antti Hallapuro,et al.  Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Thomas Sikora,et al.  Adaptive Global Motion Temporal Filtering for High Efficiency Video Coding , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Ja-Ling Wu,et al.  Scalable computation for spatially scalable video coding using NVIDIA CUDA and multi-core CPU , 2009, MM '09.

[33]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[34]  David Flynn,et al.  HEVC Complexity and Implementation Analysis , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[36]  Marta Karczewicz,et al.  A Hybrid Video Coder Based on Extended Macroblock Sizes, Improved Interpolation, and Flexible Motion Representation , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Karl-Erik Årzén,et al.  Resource Management on Multicore Systems: The ACTORS Approach , 2011, IEEE Micro.