CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement

We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the representational capability. The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions. The proposed network has been fully evaluated in the context of two typical video compression enhancement tools: post-processing (PP) and spatial resolution adaptation (SRA). CVEGAN has been fully integrated into the MPEG HEVC video coding test model (HM16.20) and experimental results demonstrate significant coding gains (up to 28% for PP and 38% for SRA compared to the anchor) over existing state-of-the-art architectures for both coding tools across multiple datasets.

[1]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Bin Li,et al.  Fully Connected Network-Based Intra Prediction for Image Coding , 2018, IEEE Transactions on Image Processing.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Kyung-Ah Sohn,et al.  Photo-realistic Image Super-resolution with Fast and Lightweight Cascading Residual Network , 2019, ArXiv.

[5]  Dong Liu,et al.  Deep Learning-Based Video Coding: A Review and A Case Study , 2019, ArXiv.

[6]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[7]  Debargha Mukherjee,et al.  An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media , 2020, APSIPA Transactions on Signal and Information Processing.

[8]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[9]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Gustavo de Veciana,et al.  An information fidelity criterion for image quality assessment using natural scene statistics , 2005, IEEE Transactions on Image Processing.

[11]  Feng Jiang,et al.  An End-to-End Compression Framework Based on Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Fan Zhang,et al.  Gan-Based Effective Bit Depth Adaptation for Perceptual Video Compression , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[19]  Yun Fu,et al.  Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Marko Viitanen,et al.  UVG dataset: 50/120fps 4K sequences for video codec analysis and development , 2020, MMSys.

[21]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[22]  Mariana Afonso,et al.  Perceptually-inspired super-resolution of compressed videos , 2019, Optical Engineering + Applications.

[23]  Yoshiyuki Yashima,et al.  Deep Learning-based Transformation Matrix Estimation for Bidirectional Interframe Prediction , 2018, 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE).

[24]  Dong Liu,et al.  Convolutional Neural Network-Based Block Up-Sampling for HEVC , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[26]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jian Yang,et al.  Image Super-Resolution via Deep Recursive Residual Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  W. Rudin Principles of mathematical analysis , 1964 .

[29]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[30]  Fahad Shahbaz Khan,et al.  NTIRE 2019 Challenge on Image Enhancement: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  D. C. Howell Statistical Methods for Psychology , 1987 .

[32]  Mai Xu,et al.  Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video , 2020, ECCV.

[33]  Junseok Kwon,et al.  Sphere Generative Adversarial Network Based on Geometric Moment Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jungwon Lee,et al.  Real-World Super-Resolution using Generative Adversarial Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Angeliki V. Katsenou,et al.  A Subjective Comparison of AV1 and HEVC for Adaptive Video Streaming , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[36]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Feng Wu,et al.  Partition-Aware Adaptive Switching Neural Networks for Post-Processing in HEVC , 2019, IEEE Transactions on Multimedia.

[38]  Fan Zhang,et al.  BVI-HD: A Video Quality Database for HEVC Compressed and Texture Synthesized Content , 2018, IEEE Transactions on Multimedia.

[39]  Patrick Le Callet,et al.  CNN-based transform index prediction in multiple transforms framework to assist entropy coding , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[40]  Thomas H. Li,et al.  Real Photographs Denoising With Noise Domain Adaptation and Attentive Generative Adversarial Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Fan Zhang,et al.  MFRNet: A New CNN Architecture for Post-Processing and In-loop Filtering , 2020, ArXiv.

[43]  C. Villani Optimal Transport: Old and New , 2008 .

[44]  Angeliki V. Katsenou,et al.  Comparing VVC, HEVC and AV1 using Objective and Subjective Assessments , 2020 .

[45]  K. R. Rao,et al.  High efficiency video coding , 2016, 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[46]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[47]  Guowei Teng,et al.  A CNN-Based Post-Processing Algorithm for Video Coding Efficiency Improvement , 2020, IEEE Access.

[48]  Diganta Misra,et al.  Mish: A Self Regularized Non-Monotonic Neural Activation Function , 2019, ArXiv.

[49]  Zibo Meng,et al.  Residual Channel Attention Generative Adversarial Network for Image Super-Resolution and Noise Reduction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  Wenhan Yang,et al.  Partition Tree Guided Progressive Rethinking Network for in-Loop Filtering of HEVC , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[51]  Chen Hong,et al.  NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[52]  Wen Gao,et al.  Enhanced Motion-Compensated Video Coding With Deep Virtual Reference Frame Generation , 2019, IEEE Transactions on Image Processing.

[53]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[54]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[55]  Alan C. Bovik,et al.  Perceptually Optimizing Deep Image Compression , 2020, ArXiv.

[56]  Mathias Wien,et al.  High Efficiency Video Coding: Coding Tools and Specification , 2014 .

[57]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Luc Van Gool,et al.  Practical Full Resolution Learned Lossless Image Compression , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Chongruo Wu,et al.  ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Feiyue Huang,et al.  Real-World Super-Resolution via Kernel Estimation and Noise Injection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[61]  Qionghai Dai,et al.  Residual Highway Convolutional Neural Networks for in-loop Filtering in HEVC , 2018, IEEE Transactions on Image Processing.

[62]  Dong Liu,et al.  Neural network-based arithmetic coding of intra prediction modes in HEVC , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[63]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Dong Liu,et al.  Convolutional Neural Network-Based Arithmetic Coding of DC Coefficients for HEVC Intra Coding , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[67]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[68]  Taco S. Cohen,et al.  Video Compression With Rate-Distortion Autoencoders , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[69]  Chih-Yang Lin,et al.  HEVC Intra Frame Coding Based on Convolutional Neural Network , 2018, IEEE Access.

[70]  Fan Zhang,et al.  ViSTRA2: Video Coding using Spatial Resolution and Effective Bit Depth Adaptation , 2019, ArXiv.

[71]  Houqiang Li,et al.  M-LVC: Multiple Frames Prediction for Learned Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[73]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[75]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[76]  Fan Zhang,et al.  BVI-DVC: A Training Database for Deep Video Compression , 2020, IEEE Transactions on Multimedia.

[77]  Xinfeng Zhang,et al.  Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding , 2019, IEEE Transactions on Image Processing.

[78]  Alan C. Bovik,et al.  Spatiotemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[79]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[80]  Mariana Afonso,et al.  Video Compression Based on Spatio-Temporal Resolution Adaptation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[81]  Wei Wang,et al.  CFSNet: Toward a Controllable Feature Space for Image Restoration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[82]  Tong Yang,et al.  Perceptual Extreme Super Resolution Network with Receptive Field Block , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[83]  Kyung-Ah Sohn,et al.  Fast, Accurate, and, Lightweight Super-Resolution with Cascading Residual Network , 2018, ECCV.

[84]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[85]  Damon M. Chandler,et al.  A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images , 2015, SPIE Optical Engineering + Applications.

[86]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Fan Zhang,et al.  Video Compression With CNN-Based Postprocessing , 2020, IEEE MultiMedia.

[88]  Gian Luca Foresti,et al.  Deep Generative Adversarial Residual Convolutional Networks for Real-World Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[89]  Xinfeng Zhang,et al.  Enhanced Bi-Prediction With Convolutional Neural Network for High-Efficiency Video Coding , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[90]  Diganta Misra Mish: A Self Regularized Non-Monotonic Activation Function , 2020, BMVC.

[91]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Xinfeng Zhang,et al.  Image and Video Compression With Neural Networks: A Review , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[93]  C.-C. Jay Kuo,et al.  MCL-V: A streaming video quality assessment database , 2015, J. Vis. Commun. Image Represent..

[94]  Kyung-Ah Sohn,et al.  Efficient deep neural network for photo-realistic image super-resolution , 2019, Pattern Recognit..

[95]  Zulin Wang,et al.  Multi-frame Quality Enhancement for Compressed Video , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[96]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.