Learned Video Compression

We present a new algorithm for video coding, learned end-to-end for the low-latency mode. In this setting, our approach outperforms all existing video codecs across nearly the entire bitrate range. To our knowledge, this is the first ML-based method to do so. We evaluate our approach on standard video compression test sets of varying resolutions, and benchmark against all mainstream commercial codecs in the low-latency mode. On standard-definition videos, HEVC/H.265, AVC/H.264 and VP9 typically produce codes up to 60% larger than our algorithm. On high-definition 1080p videos, H.265 and VP9 typically produce codes up to 20% larger, and H.264 up to 35% larger. Furthermore, our approach does not suffer from blocking artifacts and pixelation, and thus produces videos that are more visually pleasing. We propose two main contributions. The first is a novel architecture for video compression, which (1) generalizes motion estimation to perform any learned compensation beyond simple translations, (2) rather than strictly relying on previously transmitted reference frames, maintains a state of arbitrary information learned by the model, and (3) enables jointly compressing all transmitted signals (such as optical flow and residual). Secondly, we present a framework for ML-based spatial rate control --- a mechanism for assigning variable bitrates across space for each frame. This is a critical component for video coding, which to our knowledge had not been developed within a machine learning setting.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[3]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[4]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5]  Chen-Hsiu Huang Video Transcoding Architectures and Techniques : An Overview , 2003 .

[6]  David J. Fleet,et al.  Optical Flow Estimation , 2006, Handbook of Mathematical Models in Computer Vision.

[7]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Zongze Wu,et al.  Rate Control in Video Coding , 2011 .

[9]  Yann LeCun,et al.  Learning Representations by Maximizing Compression , 2011, ArXiv.

[10]  Antti Hallapuro,et al.  Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Colin Doutre,et al.  HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC? , 2012, IEEE Consumer Electronics Magazine.

[12]  Gary J. Sullivan,et al.  Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC) , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Debargha Mukherjee,et al.  The latest open-source video codec VP9 - An overview and preliminary results , 2013, 2013 Picture Coding Symposium (PCS).

[15]  R. K. Selvakumar,et al.  A New Survey on Block Matching Algorithms in Video Coding , 2014 .

[16]  Patrick Bouthemy,et al.  Optical flow modeling and computation: A survey , 2015, Comput. Vis. Image Underst..

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[20]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[21]  Valero Laparra,et al.  End-to-end optimization of nonlinear transform codes for perceptual quality , 2016, 2016 Picture Coding Symposium (PCS).

[22]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[23]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[24]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[25]  Dong Liu,et al.  A convolutional neural network approach for half-pel interpolation in video coding , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[26]  Bingbing Ni,et al.  Unsupervised Deep Learning for Optical Flow Estimation , 2017, AAAI.

[27]  Rong Xie,et al.  CNN based post-processing to improve HEVC , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28]  Tingting Wang,et al.  A Novel Deep Learning-Based Method of Improving Coding Efficiency from the Decoder-End for HEVC , 2017, 2017 Data Compression Conference (DCC).

[29]  Bin Li,et al.  A convolutional neural network-based approach to rate control in HEVC intra coding , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[30]  Luca Benini,et al.  CAS-CNN: A deep convolutional neural network for image compression artifact suppression , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[31]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhan Ma,et al.  DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[33]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[34]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[35]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[36]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[37]  Zulin Wang,et al.  Decoder-side HEVC quality enhancement with scalable convolutional neural network , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[38]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Dong Liu,et al.  Neural network-based arithmetic coding of intra prediction modes in HEVC , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[40]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[41]  Dong Liu,et al.  Convolutional Neural Network-Based Motion Compensation Refinement for Video Coding , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[42]  Li Wang,et al.  A Practical Convolutional Neural Network as Loop Filter for Intra Frame , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[43]  Xinfeng Zhang,et al.  CNN-Based Bi-Directional Motion Compensation for High Efficiency Video Coding , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[44]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[45]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[46]  Xiaoyun Zhang,et al.  Enhancing HEVC Compressed Videos with a Partition-Masked Convolutional Neural Network , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[47]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Aline Roumy,et al.  Autoencoder Based Image Compression: Can the Learning be Quantization Independent? , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  Jun Han,et al.  Deep Probabilistic Video Compression , 2018, ArXiv.

[50]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Zulin Wang,et al.  Multi-frame Quality Enhancement for Compressed Video , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[53]  David Zhang,et al.  Learning Convolutional Networks for Content-Weighted Image Compression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Zulin Wang,et al.  Enhancing Quality for HEVC Compressed Videos , 2017, IEEE Transactions on Circuits and Systems for Video Technology.