GPU Parallelization of HEVC In-Loop Filters

In the High Efficiency Video Coding (HEVC) standard, multiple decoding modules have been designed to take advantage of parallel processing. In particular, the HEVC in-loop filters (i.e., the deblocking filter and sample adaptive offset) were conceived to be exploited by parallel architectures. However, the type of the offered parallelism mostly suits the capabilities of multi-core CPUs, thus making a real challenge to efficiently exploit massively parallel architectures such as Graphic Processing Units (GPUs), mainly due to the existing data dependencies between the HEVC decoding procedures. In accordance, this paper presents a novel strategy to increase the amount of parallelism and the resulting performance of the HEVC in-loop filters on GPU devices. For this purpose, the proposed algorithm performs the HEVC filtering at frame-level and employs intrinsic GPU vector instructions. When compared to the state-of-the-art HEVC in-loop filter implementations, the proposed approach also reduces the amount of required memory transfers, thus further boosting the performance. Experimental results show that the proposed GPU in-loop filters deliver a significant improvement in decoding performance. For example, average frame rates of 76 frames per second (FPS) and 125 FPS for Ultra HD 4K are achieved on an embedded NVIDIA GPU for All Intra and Random Access configurations, respectively.

[1]  Mohamed M. Fouad,et al.  High throughput parallel scheme for HEVC deblocking filter , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[2]  K. R. Rao,et al.  High Efficiency Video Coding(HEVC) , 2014 .

[3]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[4]  Biao Wang,et al.  An Optimized Parallel IDCT on Graphics Processing Units , 2012, Euro-Par Workshops.

[5]  Biao Wang,et al.  Parallel H.264/AVC Motion Compensation for GPUs Using OpenCL , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Ilkka Hautala,et al.  Programmable Low-Power Multicore Coprocessor Architecture for HEVC/H.265 In-Loop Filtering , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Zhong Zhou,et al.  Automatic Mesh Animation Preview With User Voting-Based Refinement , 2017, IEEE Transactions on Multimedia.

[8]  Gary J. Sullivan,et al.  Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC) , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Ben H. H. Juurlink,et al.  Parallel Scalability and Efficiency of HEVC Parallelization Approaches , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Ben H. H. Juurlink,et al.  SIMD Acceleration for HEVC Decoding , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  David Flynn,et al.  HEVC Complexity and Implementation Analysis , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Munchurl Kim,et al.  Efficient In-Loop Filtering Across Tile Boundaries for Multi-Core HEVC Hardware Decoders With 4 K/8 K-UHD Video Applications , 2015, IEEE Transactions on Multimedia.

[14]  Ramakrishna Adireddy,et al.  SAO in CTU decoding loop for HEVC video decoder , 2013, 2013 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSC).

[15]  Chia-Yang Tsai,et al.  Sample Adaptive Offset in the HEVC Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Minhua Zhou,et al.  HEVC Deblocking Filter , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Zhenyu Liu,et al.  A High-Throughput and Multi-Parallel VLSI Architecture for HEVC Deblocking Filter , 2016, IEEE Transactions on Multimedia.

[18]  Nuno Roma,et al.  HEVC in-loop filters GPU parallelization in embedded systems , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).