Perceptual Learned Video Compression with Recurrent Conditional GAN

This paper proposes a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional GAN. In our approach, we employ the recurrent auto-encoder-based compression network as the generator, and the most importantly, we propose a recurrent conditional discriminator, which judges raw and compressed video conditioned on both spatial and temporal features, including the latent representation, temporal motion and hidden states in recurrent cells. This way, in the adversarial training, it pushes the generated video to be not only spatially photo-realistic but also temporally consistent with groundtruth and coherent among video frames. The experimental results show that the proposed PLVC model learns to compress video towards good perceptual quality at low bit-rate, and outperforms the official HEVC test model (HM 16.20) and the previous learned approaches on several perceptual quality metrics and user studies. The codes will be released at the project page: https://github.com/RenYang-home/PLVC.

[1]  Eirikur Agustsson,et al.  High-Fidelity Generative Image Compression , 2020, NeurIPS.

[2]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[3]  L. Gool,et al.  Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yan Wang,et al.  Checkerboard Context Model for Efficient Learned Image Compression , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[6]  Yang Yang,et al.  Feedback Recurrent Autoencoder for Video Compression , 2020, ACCV.

[7]  Jooyoung Lee,et al.  Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[8]  Taco Cohen,et al.  Adversarial Distortion for Learned Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Li Chen,et al.  Content Adaptive and Error Propagation Aware Deep Video Compression , 2020, ECCV.

[11]  Jiro Katto,et al.  Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dong Xu,et al.  FVC: A New Framework towards Deep Video Compression in Feature Space , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[14]  Marko Viitanen,et al.  UVG dataset: 50/120fps 4K sequences for video codec analysis and development , 2020, MMSys.

[15]  Radu Timofte,et al.  Learning for Video Compression With Recurrent Auto-Encoder and Recurrent Probability Model , 2020, IEEE Journal of Selected Topics in Signal Processing.

[16]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jungwon Lee,et al.  Variable Rate Deep Image Compression With a Conditional Autoencoder , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Wenhan Yang,et al.  Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression , 2020, AAAI.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[21]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[23]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Australia,et al.  Improving Deep Video Compression by Resolution-adaptive Flow Coding , 2020, ECCV.

[25]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[26]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[27]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[28]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[29]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[31]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Zhou Wang,et al.  Multi-scale structural similarity for image quality assessment , 2003 .

[34]  Houqiang Li,et al.  M-LVC: Multiple Frames Prediction for Learned Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[37]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Zhan Ma,et al.  Learned Video Compression via Joint Spatial-Temporal Correlation Exploration , 2019, AAAI.

[39]  Houqiang Li,et al.  End-to-End Optimized Versatile Image Compression With Wavelet-Like Transform , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[41]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[42]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[43]  Dong Xu,et al.  Learned image and video compression with deep neural networks , 2020, 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP).