Learning for Video Compression With Recurrent Auto-Encoder and Recurrent Probability Model

The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265. The codes are available at https://github.com/RenYang-home/RLVC.git.

[1]  Marko Viitanen,et al.  UVG dataset: 50/120fps 4K sequences for video codec analysis and development , 2020, MMSys.

[2]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[3]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  J. Glosup Handbook of the Logistic Distribution , 1993 .

[5]  Wenhan Yang,et al.  Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression , 2020, AAAI.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[8]  Feng Wu,et al.  Learning for Video Compression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Joonki Paik,et al.  Dual Autoencoder Network for Retinex-Based Low-Light Image Enhancement , 2018, IEEE Access.

[11]  Glen G. Langdon,et al.  An Introduction to Arithmetic Coding , 1984, IBM J. Res. Dev..

[12]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jun Yu,et al.  Coupled Deep Autoencoder for Single Image Super-Resolution , 2017, IEEE Transactions on Cybernetics.

[16]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[17]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[18]  Zhan Ma,et al.  Learned Video Compression via Joint Spatial-Temporal Correlation Exploration , 2019, AAAI.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Lovedeep Gondara,et al.  Medical Image Denoising Using Convolutional Denoising Autoencoders , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[24]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Luc Van Gool,et al.  OpenDVC: An Open Source Implementation of the DVC Video Compression Method , 2020, ArXiv.

[27]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[28]  Jooyoung Lee,et al.  Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[29]  Jiro Katto,et al.  Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[31]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[32]  Dong Liu,et al.  One-for-All: Grouped Variation Network-Based Fractional Interpolation in Video Coding , 2019, IEEE Transactions on Image Processing.

[33]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[34]  Xiaoming Tao,et al.  A Deep Learning Approach for Multi-Frame In-Loop Filter of HEVC , 2019, IEEE Transactions on Image Processing.

[35]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[36]  Soumik Sarkar,et al.  LLNet: A deep autoencoder approach to natural low-light image enhancement , 2015, Pattern Recognit..

[37]  Dacheng Tao,et al.  Non-Local Auto-Encoder With Collaborative Stabilization for Image Restoration , 2016, IEEE Transactions on Image Processing.

[38]  L. Gool,et al.  Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Didier J. Le Gall,et al.  The MPEG video compression algorithm , 1992, Signal Process. Image Commun..

[40]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[41]  Ivan V. Bajic,et al.  Deep Frame Prediction for Video Coding , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[44]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[45]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zulin Wang,et al.  Reducing Complexity of HEVC: A Deep Learning Approach , 2017, IEEE Transactions on Image Processing.

[48]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Zhan Ma,et al.  DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[50]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[51]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[52]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[53]  Taco S. Cohen,et al.  Video Compression With Rate-Distortion Autoencoders , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[55]  Ping Wang,et al.  MCL-JCV: A JND-based H.264/AVC video quality assessment dataset , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[56]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Kyunghyun Cho,et al.  Simple Sparsification Improves Sparse Denoising Autoencoders in Denoising Highly Corrupted Images , 2013, ICML.

[58]  Dong Liu,et al.  A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding , 2016, MMM.

[59]  David Zhang,et al.  Learning Convolutional Networks for Content-Weighted Image Compression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[61]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..