Advancing Learned Video Compression With In-Loop Frame Prediction

Recent years have witnessed an increasing interest in end-to-end learned video compression. Most previous works explore temporal redundancy by detecting and compressing a motion map to warp the reference frame towards the target frame. Yet, it failed to adequately take advantage of the historical priors in the sequential reference frames. In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module, which is able to effectively predict the target frame from the previously compressed frames, without consuming any bit-rate. The predicted frame can serve as a better reference than the previously compressed frame, and therefore it benefits the compression performance. The proposed in-loop prediction module is a part of the end-to-end video compression and is jointly optimized in the whole framework. We propose the recurrent and the bi-directional in-loop prediction modules for compressing P-frames and B-frames, respectively. The experiments show the state-of-the-art performance of our ALVC approach in learned video compression. We also outperform the default hierarchical B mode of x265 in terms of PSNR and beat the slowest mode of the SSIM-tuned x265 on MS-SSIM. The project page: https://github.com/RenYang-home/ALVC.

[1]  L. Gool,et al.  Implicit Neural Representations for Image Compression , 2021, ECCV.

[2]  Radu Timofte,et al.  Deep Learning for Visual Data Compression , 2021, ACM Multimedia.

[3]  Gary J. Sullivan,et al.  Overview of the Versatile Video Coding (VVC) Standard and its Applications , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Bin Li,et al.  Deep Contextual Video Compression , 2021, NeurIPS.

[5]  L. Gool,et al.  Perceptual Learned Video Compression with Recurrent Conditional GAN , 2021, IJCAI.

[6]  Qingming Huang,et al.  Deep Affine Motion Compensation Network for Inter Prediction in VVC , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Qifeng Chen,et al.  Enhanced Invertible Encoding for Learned Image Compression , 2021, ACM Multimedia.

[8]  Zhan Ma,et al.  End-to-End Neural Video Coding Using a Compound Spatiotemporal Representation , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Dong Xu,et al.  FVC: A New Framework towards Deep Video Compression in Feature Space , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yong Man Ro,et al.  Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yan Wang,et al.  Checkerboard Context Model for Efficient Learned Image Compression , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dong Xu,et al.  Learned image and video compression with deep neural networks , 2020, 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP).

[13]  Houqiang Li,et al.  End-to-End Optimized Versatile Image Compression With Wavelet-Like Transform , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Australia,et al.  Improving Deep Video Compression by Resolution-adaptive Flow Coding , 2020, ECCV.

[15]  Hyomin Choi,et al.  Affine Transformation-Based Deep Frame Prediction , 2020, IEEE Transactions on Image Processing.

[16]  Yu Qiao,et al.  Enhanced Quadratic Video Interpolation , 2020, ECCV Workshops.

[17]  L. Gool,et al.  OpenDVC: An Open Source Implementation of the DVC Video Compression Method , 2020, ArXiv.

[18]  Radu Timofte,et al.  Learning for Video Compression With Recurrent Auto-Encoder and Recurrent Probability Model , 2020, IEEE Journal of Selected Topics in Signal Processing.

[19]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Houqiang Li,et al.  M-LVC: Multiple Frames Prediction for Learned Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Li Chen,et al.  An End-to-End Learning Framework for Video Compression , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yang Yang,et al.  Feedback Recurrent Autoencoder for Video Compression , 2020, ACCV.

[23]  Wenhan Yang,et al.  Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression , 2020, AAAI.

[24]  Li Chen,et al.  Content Adaptive and Error Propagation Aware Deep Video Compression , 2020, ECCV.

[25]  L. Gool,et al.  Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Xiaowei Li,et al.  Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jan Kautz,et al.  Convolutional Tensor-Train LSTM for Spatio-temporal Learning , 2020, NeurIPS.

[28]  Masaru Takeuchi,et al.  Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhan Ma,et al.  Learned Video Compression via Joint Spatial-Temporal Correlation Exploration , 2019, AAAI.

[30]  Qian Yin,et al.  Quadratic video interpolation , 2019, NeurIPS.

[31]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Wenhan Yang,et al.  Deep Inter Prediction Via Pixel-Wise Motion Oriented Reference Generation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[33]  Taco S. Cohen,et al.  Video Compression With Rate-Distortion Autoencoders , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Jiro Katto,et al.  Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ivan V. Bajic,et al.  Deep Frame Prediction for Video Coding , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jooyoung Lee,et al.  Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[38]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[39]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[40]  Philip S. Yu,et al.  PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning , 2018, ICML.

[41]  Markus H. Gross,et al.  PhaseNet for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[43]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Philip S. Yu,et al.  PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs , 2017, NIPS.

[45]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  W. Freeman,et al.  Video Enhancement with Task-Oriented Flow , 2017, International Journal of Computer Vision.

[47]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[49]  Mu Li,et al.  Learning Convolutional Networks for Content-Weighted Image Compression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[53]  Raymond A. Yeh,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[55]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[58]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[59]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[60]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[61]  Martin Reisslein,et al.  Video Traffic Characteristics of Modern Encoding Standards: H.264/AVC with SVC and MVC Extensions and H.265/HEVC , 2014, TheScientificWorldJournal.

[62]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[63]  Wenjun Zeng,et al.  Motion Refinement Based Progressive Side-Information Estimation for Wyner-Ziv Video Coding , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[64]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[65]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[66]  Ming-Ting Sun,et al.  Motion Vector Refinement for High-Performance Transcoding , 1999, IEEE Trans. Multim..

[67]  Didier J. Le Gall,et al.  The MPEG video compression algorithm , 1992, Signal Process. Image Commun..

[68]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[69]  Peter Secretan Learning , 1965, Mental Health.

[70]  Yunbo Wang,et al.  Eidetic 3D LSTM: A Model for Video Prediction and Beyond , 2019, ICLR.

[71]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[72]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .