A recurrent video quality enhancement framework with multi-granularity frame-fusion and frame difference based attention

Abstract In recent years, deep learning has attracted substantial research attention for video restoration. Among the existing contributions, the single-frame based approaches purely rely on one reference frame and neglect the rest neighboring frames when enhancing a target frame. By contrast, the multi-frame based contributions exploit temporal information in a sliding window and the existing recurrent design only employ a single preceding enhanced frame. It is intuitive to exploit both multiple original neighboring frames and the preceding enhanced frames for video quality enhancement. In this paper, we propose a Recurrent video quality Enhancement framework with Multi-granularity frame-fusion and frame Difference based attention (REMD). Firstly, we devise a three-dimensional convolutional neural network based encoder-decoder fusion model, which fuses multiple frames in multi-granularity. Secondly, severe compression artifacts tend to emerge on the edges and textures of the compressed frames. We propose a frame difference based spatial attention method to intensify the edges and textures of motioning regions. Finally,a recurrent sliding window design is conceived for exploiting the temporal information in preceding enhanced frames and subsequent neighboring frames. Experiments demonstrate that our method achieves superior performance in comparison to the state-of-the-art contributions with substantially reduced spatial and computational complexity.

[1]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[2]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[3]  Wen Lu,et al.  Video quality assessment by compact representation of energy in 3D-DCT domain , 2017, Neurocomputing.

[4]  Michael K. Ng,et al.  Reducing Artifacts in JPEG Decompression Via a Learned Dictionary , 2014, IEEE Transactions on Signal Processing.

[5]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[6]  Thomas Sikora,et al.  The MPEG-4 video standard verification model , 1997, IEEE Trans. Circuits Syst. Video Technol..

[7]  Hong Yan,et al.  Blocking artifacts suppression in block-coded images using overcomplete wavelet representation , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Yingming Li,et al.  Recurrent convolutional video captioning with global and local attention , 2019, Neurocomputing.

[9]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[10]  Yi Xu,et al.  Non-Local ConvLSTM for Video Compression Artifact Reduction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[12]  Tingting Wang,et al.  A Novel Deep Learning-Based Method of Improving Coding Efficiency from the Decoder-End for HEVC , 2017, 2017 Data Compression Conference (DCC).

[13]  Dong Xu,et al.  Deep Kalman Filtering Network for Video Compression Artifact Reduction , 2018, ECCV.

[14]  Shengxi Li,et al.  Weight-based R-λ rate control for perceptual HEVC coding on conversational videos , 2015, Signal Process. Image Commun..

[15]  Karen O. Egiazarian,et al.  Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images , 2007, IEEE Transactions on Image Processing.

[16]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[18]  Thomas S. Huang,et al.  Close the loop: Joint blind image restoration and recognition with sparse representation prior , 2011, 2011 International Conference on Computer Vision.

[19]  Ke Li,et al.  Real-time video super-resolution via motion convolution kernel estimation , 2019, Neurocomputing.

[20]  Licheng Jiao,et al.  Image deblocking via sparse representation , 2012, Signal Process. Image Commun..

[21]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Dong Liu,et al.  A CNN-Based In-Loop Filter with CU Classification for HEVC , 2018, 2018 IEEE Visual Communications and Image Processing (VCIP).

[23]  King Ngi Ngan,et al.  Study of subjective and objective quality assessment of retargeted images , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[24]  Lajos Hanzo,et al.  A Tutorial and Review on Inter-Layer FEC Coded Layered Video Streaming , 2015, IEEE Communications Surveys & Tutorials.

[25]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Rajitha Weerakkody,et al.  Verification testing of HEVC compression performance for UHD video , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[27]  Zulin Wang,et al.  Enhancing Quality for HEVC Compressed Videos , 2017, IEEE Transactions on Circuits and Systems for Video Technology.