Fast Non-Local Adaptive In-Loop Filter Optimization on GPU

The non-local adaptive in-loop filter (NALF) for video coding has achieved significant coding gain by exploiting image non-local self-similarity (NSS) to efficiently reduce the compression artifacts. However, the intensive computation of NALF hinders its practical deployment in video standardizations. In this paper, we propose a fast NALF optimization algorithm in parallel-computing framework by leveraging the massive parallel execution resources of GPU. First, the computational complexity of original NALF is analyzed in depth, then the pipelines of computational-intensive modules are re-designed to adapt to the general-purpose GPU with more parallel-friendly consideration. Specifically, we speed up the NALF by optimizing thread allocation to maximize the parallelism degree and elaborately designing the GPU block dimension to avoid access conflict. The group-level and pixel-level parallelization for collaboratively filtering and patch matching modules are designed respectively. To reduce the cost in data transmission, the whole filtering process is implemented on GPU by taking the advantage of low data dependency in NALF. Extensive experimental results show that the proposed fast NALF optimization using GPU architecture achieves high-speeed processing while maintaining the significant coding performance of original NALF, which shows the potential of NALF in the future video coding standard.

[1]  Jeffrey A. Fessler,et al.  Edge-Preserving Image Denoising via Group Coordinate Descent on the GPU , 2015, IEEE Transactions on Image Processing.

[2]  Faouzi Kossentini,et al.  H.263+: video coding at low bit rates , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3]  Siwei Ma,et al.  Optimized Non-local In-Loop Filter for Video Coding , 2018, 2018 Picture Coding Symposium (PCS).

[4]  Kai Wang,et al.  Non-local means denoising algorithm accelerated by GPU , 2009, International Symposium on Multispectral Image Processing and Pattern Recognition.

[5]  Jani Lainema,et al.  Adaptive deblocking filter , 2003, IEEE Trans. Circuits Syst. Video Technol..

[6]  Wen Gao,et al.  Nonlocal In-Loop Filter: The Way Toward Next-Generation Video Coding? , 2016, IEEE MultiMedia.

[7]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[8]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[9]  Wen Gao,et al.  AVS2 ? Making Video Coding Smarter [Standards in a Nutshell] , 2015, IEEE Signal Processing Magazine.

[10]  Takashi Watanabe,et al.  Adaptive Loop Filtering for Video Coding , 2013, IEEE Journal of Selected Topics in Signal Processing.

[11]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[12]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[13]  Ming-Ting Sun,et al.  Video bridging based on H.261 standard , 1994, IEEE Trans. Circuits Syst. Video Technol..

[14]  Munchurl Kim,et al.  Efficient In-Loop Filtering Across Tile Boundaries for Multi-Core HEVC Hardware Decoders With 4 K/8 K-UHD Video Applications , 2015, IEEE Transactions on Multimedia.

[15]  Marta Karczewicz,et al.  Geometry transformation-based adaptive in-loop filter , 2016, 2016 Picture Coding Symposium (PCS).

[16]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Feng Wu,et al.  Overview of AVS video standard , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[20]  Chia-Yang Tsai,et al.  Sample Adaptive Offset in the HEVC Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Wen Gao,et al.  AVS2—Making Video Coding Smarter , 2015 .

[22]  Nuno Roma,et al.  GHEVC: An Efficient HEVC Decoder for Graphics Processing Units , 2017, IEEE Transactions on Multimedia.

[23]  Ying Wang,et al.  Bilateral filtering for video coding , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[24]  Xinfeng Zhang,et al.  Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding , 2019, IEEE Transactions on Image Processing.

[25]  Wen Gao,et al.  Low-Rank-Based Nonlocal Adaptive Loop Filter for High-Efficiency Video Compression , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Wen Gao,et al.  Structure-driven Adaptive Non-local Filter for High Efficiency Video Coding (HEVC) , 2016, 2016 Data Compression Conference (DCC).

[27]  Wen Gao,et al.  GPU-Based Hierarchical Motion Estimation for High Efficiency Video Coding , 2019, IEEE Transactions on Multimedia.

[28]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[29]  Minhua Zhou,et al.  HEVC Deblocking Filter , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Jun Sun,et al.  Novel Efficient HEVC Decoding Solution on General-Purpose Processors , 2014, IEEE Transactions on Multimedia.

[31]  Xinfeng Zhang,et al.  Spatial-temporal residue network based in-loop filter for video coding , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[32]  Nuno Roma,et al.  Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems , 2014, IEEE Transactions on Multimedia.

[33]  Enrico Magli,et al.  Parallel H.264/AVC Fast Rate-Distortion Optimized Motion Estimation by Using a Graphics Processing Unit and Dedicated Hardware , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Oscar C. Au,et al.  Video Coding on Multicore Graphics Processors , 2010, IEEE Signal Processing Magazine.