Deep Probabilistic Video Compression

We propose a variational inference approach to deep probabilistic video compression. Our model uses advances in variational autoencoders (VAEs) for sequential data and combines it with recent work on neural image compression. The approach jointly learns to transform the original video into a lower-dimensional representation as well as to entropy code this representation according to a temporally-conditioned probabilistic model. We split the latent space into local (per frame) and global (per segment) variables, and show that training the VAE to utilize both representations leads to an improved rate-distortion performance. Evaluation on small videos from public data sets with varying complexity and diversity show that our model yields competitive results when trained on generic video content. Extreme compression performance is achieved for videos with specialized content if the model is trained on similar videos.

[1]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[2]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[3]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[4]  Ulrich Neumann,et al.  Stochastic Video Long-term Interpolation , 2018, ArXiv.

[5]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[6]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[7]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Glen G. Langdon,et al.  An Introduction to Arithmetic Coding , 1984, IBM J. Res. Dev..

[10]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[11]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[12]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[14]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[15]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[16]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[17]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[19]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[20]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Zhan Ma,et al.  DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[23]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[24]  Stephan Mandt,et al.  Disentangled Sequential Autoencoder , 2018, ICML.

[25]  Jiawei He,et al.  Probabilistic Video Generation using Holistic Attribute Control , 2018, ECCV.

[26]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[27]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[28]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[29]  Farhan Abrol,et al.  Variational Tempering , 2016, AISTATS.

[30]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[32]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[33]  P. Pirsch,et al.  Advances in picture coding , 1985, Proceedings of the IEEE.

[34]  Feng Wu,et al.  Learning for Video Compression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[36]  Debargha Mukherjee,et al.  A Technical Overview of VP9—The Latest Open-Source Video Codec , 2013 .

[37]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[38]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.

[39]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[40]  Christian Osendorfer,et al.  Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[41]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[42]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .