论文信息 - Learned Video Compression

Learned Video Compression

We present a new algorithm for video coding, learned end-to-end for the low-latency mode. In this setting, our approach outperforms all existing video codecs across nearly the entire bitrate range. To our knowledge, this is the first ML-based method to do so. We evaluate our approach on standard video compression test sets of varying resolutions, and benchmark against all mainstream commercial codecs in the low-latency mode. On standard-definition videos, HEVC/H.265, AVC/H.264 and VP9 typically produce codes up to 60% larger than our algorithm. On high-definition 1080p videos, H.265 and VP9 typically produce codes up to 20% larger, and H.264 up to 35% larger. Furthermore, our approach does not suffer from blocking artifacts and pixelation, and thus produces videos that are more visually pleasing. We propose two main contributions. The first is a novel architecture for video compression, which (1) generalizes motion estimation to perform any learned compensation beyond simple translations, (2) rather than strictly relying on previously transmitted reference frames, maintains a state of arbitrary information learned by the model, and (3) enables jointly compressing all transmitted signals (such as optical flow and residual). Secondly, we present a framework for ML-based spatial rate control --- a mechanism for assigning variable bitrates across space for each frame. This is a critical component for video coding, which to our knowledge had not been developed within a machine learning setting.

[1] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[3] Zhou Wang,et al. Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[4] Ajay Luthra,et al. Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5] Chen-Hsiu Huang. Video Transcoding Architectures and Techniques : An Overview , 2003 .

[6] David J. Fleet,et al. Optical Flow Estimation , 2006, Handbook of Mathematical Models in Computer Vision.

[7] Michael J. Black,et al. Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8] Zongze Wu,et al. Rate Control in Video Coding , 2011 .

[9] Yann LeCun,et al. Learning Representations by Maximizing Compression , 2011, ArXiv.

[10] Antti Hallapuro,et al. Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11] Colin Doutre,et al. HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC? , 2012, IEEE Consumer Electronics Magazine.

[12] Gary J. Sullivan,et al. Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC) , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13] Cordelia Schmid,et al. DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[14] Debargha Mukherjee,et al. The latest open-source video codec VP9 - An overview and preliminary results , 2013, 2013 Picture Coding Symposium (PCS).

[15] R. K. Selvakumar,et al. A New Survey on Block Matching Algorithms in Video Coding , 2014 .

[16] Patrick Bouthemy,et al. Optical flow modeling and computation: A survey , 2015, Comput. Vis. Image Underst..

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[20] David Minnen,et al. Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[21] Valero Laparra,et al. End-to-end optimization of nonlinear transform codes for perceptual quality , 2016, 2016 Picture Coding Symposium (PCS).

[22] Daan Wierstra,et al. Towards Conceptual Compression , 2016, NIPS.

[23] Ole Winther,et al. Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[24] Valero Laparra,et al. End-to-end Optimized Image Compression , 2016, ICLR.

[25] Dong Liu,et al. A convolutional neural network approach for half-pel interpolation in video coding , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[26] Bingbing Ni,et al. Unsupervised Deep Learning for Optical Flow Estimation , 2017, AAAI.

[27] Rong Xie,et al. CNN based post-processing to improve HEVC , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28] Tingting Wang,et al. A Novel Deep Learning-Based Method of Improving Coding Efficiency from the Decoder-End for HEVC , 2017, 2017 Data Compression Conference (DCC).

[29] Bin Li,et al. A convolutional neural network-based approach to rate control in HEVC intra coding , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[30] Luca Benini,et al. CAS-CNN: A deep convolutional neural network for image compression artifact suppression , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[31] David Minnen,et al. Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Zhan Ma,et al. DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[33] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[34] Lucas Theis,et al. Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[35] Lubomir D. Bourdev,et al. Real-Time Adaptive Image Compression , 2017, ICML.

[36] Shuicheng Yan,et al. Dual Path Networks , 2017, NIPS.

[37] Zulin Wang,et al. Decoder-side HEVC quality enhancement with scalable convolutional neural network , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[38] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Dong Liu,et al. Neural network-based arithmetic coding of intra prediction modes in HEVC , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[40] Kilian Q. Weinberger,et al. Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[41] Dong Liu,et al. Convolutional Neural Network-Based Motion Compensation Refinement for Video Coding , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[42] Li Wang,et al. A Practical Convolutional Neural Network as Loop Filter for Intra Frame , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[43] Xinfeng Zhang,et al. CNN-Based Bi-Directional Motion Compensation for High Efficiency Video Coding , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[44] David Minnen,et al. Variational image compression with a scale hyperprior , 2018, ICLR.

[45] David Minnen,et al. Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[46] Xiaoyun Zhang,et al. Enhancing HEVC Compressed Videos with a Partition-Masked Convolutional Neural Network , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[47] David Minnen,et al. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Aline Roumy,et al. Autoencoder Based Image Compression: Can the Learning be Quantization Independent? , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49] Jun Han,et al. Deep Probabilistic Video Compression , 2018, ArXiv.

[50] Luc Van Gool,et al. Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Zulin Wang,et al. Multi-frame Quality Enhancement for Compressed Video , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Chao-Yuan Wu,et al. Video Compression through Image Interpolation , 2018, ECCV.

[53] David Zhang,et al. Learning Convolutional Networks for Content-Weighted Image Compression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54] Luc Van Gool,et al. Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55] Zulin Wang,et al. Enhancing Quality for HEVC Compressed Videos , 2017, IEEE Transactions on Circuits and Systems for Video Technology.