ELF-VC: Efficient Learned Flexible-Rate Video Coding

While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (Iand Pframes only) along with a considerable increase in computational efficiency. In this setting, for natural videos our approach compares favorably across the entire R-D curve under metrics PSNR, MS-SSIM and VMAF against all mainstream video standards (H.264, H.265, AV1) and all ML codecs. At the same time, our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures. Our contributions include a flexible-rate framework allowing a single model to cover a large and dense range of bitrates, at a negligible increase in computation and parameter count; an efficient backbone optimized for MLbased codecs; and a novel in-loop flow prediction scheme which leverages prior information towards more efficient compression. We benchmark our method, which we call ELF-VC (Efficient, Learned and Flexible Video Coding) on popular video test sets UVG and MCL-JCV under metrics PSNR, MS-SSIM and VMAF. For example, on UVG under PSNR, it reduces the BD-rate by 44% against H.264, 26% against H.265, 15% against AV1, and 35% against the current best ML codec. At the same time, on an NVIDIA Titan V GPU our approach encodes/decodes VGA at 49/91 FPS, HD 720 at 19/35 FPS, and HD 1080 at 10/18 FPS.

[1]  Yao Wang,et al.  Neural Video Coding Using Multiscale Motion Compensation and Spatiotemporal Context Model , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Munchurl Kim,et al.  Deep Predictive Video Compression with Bi-directional Prediction , 2019, ArXiv.

[5]  Meet Shah,et al.  Conditional Entropy Coding for Efficient Video Compression , 2020, ECCV.

[6]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[7]  Elad Eban,et al.  Computationally Efficient Neural Image Compression , 2019, ArXiv.

[8]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[11]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[12]  Jing Wang,et al.  G-VAE: A Continuously Variable Rate Deep Image Compression Framework , 2020, ArXiv.

[13]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[15]  Li Chen,et al.  Content Adaptive and Error Propagation Aware Deep Video Compression , 2020, ECCV.

[16]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[17]  Ping Wang,et al.  MCL-JCV: A JND-based H.264/AVC video quality assessment dataset , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[18]  Yue Chen,et al.  An Overview of Core Coding Tools in the AV1 Video Codec , 2018, 2018 Picture Coding Symposium (PCS).

[19]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Bo Bai,et al.  Variable Rate Image Compression with Content Adaptive Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Dong Xu,et al.  A Unified End-to-End Framework for Efficient Deep Image Compression , 2020, ArXiv.

[23]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Yang Yang,et al.  Feedback Recurrent Autoencoder for Video Compression , 2020, ACCV.

[25]  Debargha Mukherjee,et al.  The latest open-source video codec VP9 - An overview and preliminary results , 2013, 2013 Picture Coding Symposium (PCS).

[26]  Marko Viitanen,et al.  UVG dataset: 50/120fps 4K sequences for video codec analysis and development , 2020, MMSys.

[27]  Radu Timofte,et al.  Learning for Video Compression With Recurrent Auto-Encoder and Recurrent Probability Model , 2020, IEEE Journal of Selected Topics in Signal Processing.

[28]  Jungwon Lee,et al.  Variable Rate Deep Image Compression With a Conditional Autoencoder , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[30]  Valero Laparra,et al.  End-to-end optimization of nonlinear transform codes for perceptual quality , 2016, 2016 Picture Coding Symposium (PCS).

[31]  Houqiang Li,et al.  M-LVC: Multiple Frames Prediction for Learned Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[34]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Ming Lu,et al.  Neural Video Compression using Spatio-Temporal Priors. , 2019, 1902.07383.

[36]  Australia,et al.  Improving Deep Video Compression by Resolution-adaptive Flow Coding , 2020, ECCV.

[37]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[38]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.