Neural Video Compression using Spatio-Temporal Priors.

The pursuit of higher compression efficiency continuously drives the advances of video coding technologies. Fundamentally, we wish to find better "predictions" or "priors" that are reconstructed previously to remove the signal dependency efficiently and to accurately model the signal distribution for entropy coding. In this work, we propose a neural video compression framework, leveraging the spatial and temporal priors, independently and jointly to exploit the correlations in intra texture, optical flow based temporal motion and residuals. Spatial priors are generated using downscaled low-resolution features, while temporal priors (from previous reference frames and residuals) are captured using a convolutional neural network based long-short term memory (ConvLSTM) structure in a temporal recurrent fashion. All of these parts are connected and trained jointly towards the optimal rate-distortion performance. Compared with the High-Efficiency Video Coding (HEVC) Main Profile (MP), our method has demonstrated averaged 38% Bjontegaard-Delta Rate (BD-Rate) improvement using standard common test sequences, where the distortion is multi-scale structural similarity (MS-SSIM).

[1]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[2]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[5]  Gary J. Sullivan,et al.  Video Compression - From Concepts to the H.264/AVC Standard , 2005, Proceedings of the IEEE.

[6]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[7]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  F. Gers,et al.  Long short-term memory in recurrent neural networks , 2001 .

[9]  Zhan Ma,et al.  DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[10]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Seoung Wug Oh,et al.  Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Gary J. Sullivan,et al.  Rate-distortion optimization for video compression , 1998, IEEE Signal Process. Mag..

[14]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Humberto de Jesús Ochoa Domínguez,et al.  Versatile Video Coding , 2019 .

[17]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[18]  Zhan Ma,et al.  Deep Image Compression via End-to-End Learning , 2018, CVPR Workshops.

[19]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.