Conditional Entropy Coding for Efficient Video Compression

We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.

[1]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[2]  Valero Laparra,et al.  Density Modeling of Images using a Generalized Normalization Transformation , 2015, ICLR.

[3]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[4]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Margaret H. Pinson,et al.  The Consumer Digital Video Library , 2010 .

[6]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[7]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[9]  Michal Irani,et al.  InGAN: Capturing and Retargeting the “DNA” of a Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Shai Bagon,et al.  InGAN: Capturing and Remapping the "DNA" of a Natural Image , 2018 .

[12]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[13]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[14]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Raquel Urtasun,et al.  DSIC: Deep Stereo Image Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Stephan Mandt,et al.  Deep Generative Video Compression , 2018, NeurIPS.

[18]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Abdelaziz Djelouah,et al.  Content Adaptive Optimization for Neural Image Compression , 2019, CVPR Workshops.

[20]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[21]  Taco S. Cohen,et al.  Video Compression With Rate-Distortion Autoencoders , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[23]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[24]  Yao Wang,et al.  End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling , 2019, IEEE Transactions on Image Processing.

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  Luc Van Gool,et al.  Practical Full Resolution Learned Lossless Image Compression , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[29]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Dani Lischinski,et al.  Non-stationary texture synthesis by adversarial expansion , 2018, ACM Trans. Graph..

[31]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ming Lu,et al.  Neural Video Compression using Spatio-Temporal Priors. , 2019, 1902.07383.

[33]  Jooyoung Lee,et al.  Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[34]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[35]  Alexei A. Efros,et al.  Test-Time Training for Out-of-Distribution Generalization , 2019, ArXiv.

[36]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ping Wang,et al.  MCL-JCV: A JND-based H.264/AVC video quality assessment dataset , 2016, 2016 IEEE International Conference on Image Processing (ICIP).