Versatile Learned Video Compression

Learned video compression methods have demonstrated great promise in catching up with traditional video codecs in their rate-distortion (R-D) performance. However, existing learned video compression schemes are limited by the binding of the prediction mode and the fixed network framework. They are unable to support various inter prediction modes and thus inapplicable for various scenarios. In this paper, to break this limitation, we propose a versatile learned video compression (VLVC) framework that uses one model to support all possible prediction modes. Specifically, to realize versatile compression, we first build a motion compensation module that applies multiple 3D motion vector fields (i.e., voxel flows) for weighted trilinear warping in spatial-temporal space. The voxel flows convey the information of temporal reference position that helps to decouple inter prediction modes away from framework designing. Secondly, in case of multiple-reference-frame prediction, we apply a flow prediction module to predict accurate motion trajectories with unified polynomial functions. We show that the flow prediction module can largely reduce the transmission cost of voxel flows. Experimental results demonstrate that our proposed VLVC not only supports versatile compression in various settings, but also is the first end-to-end learned video compression method that outperforms the latest VVC/H.266 standard reference software in terms of MS-SSIM.

[1]  Gary J. Sullivan,et al.  Overview of the Versatile Video Coding (VVC) Standard and its Applications , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Kedar Tatwawadi,et al.  ELF-VC: Efficient Learned Flexible-Rate Video Coding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  W. Hamidouche,et al.  Conditional Coding for Flexible Learned Video Compression , 2021, 2104.07930.

[4]  Zhibo Chen,et al.  Soft then Hard: Rethinking the Quantization in Neural Image Compression , 2021, ICML.

[5]  Reza Pourreza,et al.  Extending Neural P-frame Codecs for B-frame Coding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Meet Shah,et al.  Conditional Entropy Coding for Efficient Video Compression , 2020, ECCV.

[7]  Rasoul Mohammadi Nasiri,et al.  All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling , 2020, ECCV.

[8]  Yao Wang,et al.  Neural Video Coding Using Multiscale Motion Compensation and Spatiotemporal Context Model , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Eirikur Agustsson,et al.  Nonlinear Transform Coding , 2020, IEEE Journal of Selected Topics in Signal Processing.

[10]  Radu Timofte,et al.  Learning for Video Compression With Recurrent Auto-Encoder and Recurrent Probability Model , 2020, IEEE Journal of Selected Topics in Signal Processing.

[11]  Eirikur Agustsson,et al.  Universally Quantized Neural Compression , 2020, NeurIPS.

[12]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Marko Viitanen,et al.  UVG dataset: 50/120fps 4K sequences for video codec analysis and development , 2020, MMSys.

[14]  Houqiang Li,et al.  M-LVC: Multiple Frames Prediction for Learned Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Li Chen,et al.  An End-to-End Learning Framework for Video Compression , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Zhibo Chen,et al.  Learned Video Compression with Feature-level Residuals , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Feng Liu,et al.  Softmax Splatting for Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jing Wang,et al.  G-VAE: A Continuously Variable Rate Deep Image Compression Framework , 2020, ArXiv.

[19]  L. Gool,et al.  Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Zhen Chen,et al.  Refined TV-L1 Optical Flow Estimation Using Joint Filtering , 2020, IEEE Transactions on Multimedia.

[21]  Masaru Takeuchi,et al.  Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Qian Yin,et al.  Quadratic video interpolation , 2019, NeurIPS.

[23]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Taco S. Cohen,et al.  Video Compression With Rate-Distortion Autoencoders , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Taeoh Kim,et al.  AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wenhan Yang,et al.  Deep Reference Generation With Multi-Domain Hierarchical Constraints for Inter Prediction , 2019, IEEE Transactions on Multimedia.

[27]  Humberto de Jesús Ochoa Domínguez,et al.  Versatile Video Coding , 2019 .

[28]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[31]  Jon Barker,et al.  SDC-Net: Video Prediction Using Spatially-Displaced Convolution , 2018, ECCV.

[32]  Feng Wu,et al.  Learning for Video Compression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[34]  Gang Wang,et al.  Recurrent Spatial Pyramid CNN for Optical Flow Estimation , 2018, IEEE Transactions on Multimedia.

[35]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[36]  W. Freeman,et al.  Video Enhancement with Task-Oriented Flow , 2017, International Journal of Computer Vision.

[37]  Deqing Sun,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Raymond A. Yeh,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[42]  Tiejun Huang,et al.  Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN , 2016, IEEE Transactions on Multimedia.

[43]  Ping Wang,et al.  MCL-JCV: A JND-based H.264/AVC video quality assessment dataset , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Hongliang Li,et al.  A Fast HEVC Inter CU Selection Method Based on Pyramid Motion Divergence , 2014, IEEE Transactions on Multimedia.

[46]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Zhou Wang,et al.  Multi-scale structural similarity for image quality assessment , 2003 .

[48]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[49]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[50]  Xiaohong Liu,et al.  Video Frame Interpolation via Generalized Deformable Convolution , 2022, IEEE Transactions on Multimedia.

[51]  Zhen Chen,et al.  Self-Attention-Based Multiscale Feature Learning Optical Flow With Occlusion Feature Map Prediction , 2022, IEEE Transactions on Multimedia.

[52]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .