Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding

Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and the true posterior. In this paper, we show how to remove this gap asymptotically by deriving bits-back coding algorithms from tighter variational bounds. The key idea is to exploit extended space representations of Monte Carlo estimators of the marginal likelihood. Naively applied, our schemes would require more initial bits than the standard bits-back coder, but we show how to drastically reduce this additional cost with couplings in the latent space. When parallel architectures can be exploited, our coders can achieve better rates than bits-back with little additional cost. We demonstrate improved lossless compression rates in a variety of settings, especially in out-ofdistribution or sequential data compression.

[1]  C. S. Wallace,et al.  Classification by Minimum-Message-Length Inference , 1991, ICCI.

[2]  P. Moral,et al.  A nonasymptotic theorem for unnormalized Feynman-Kac particle models , 2011 .

[3]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[4]  A. Doucet,et al.  A lognormal central limit theorem for particle approximations of normalizing constants , 2013, 1307.0181.

[5]  A. J. Walker New fast method for generating discrete random numbers with arbitrary frequency distributions , 1974 .

[6]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[7]  Robert Peharz,et al.  Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters , 2018, ICLR.

[8]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[9]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Arnaud Doucet,et al.  Hamiltonian Variational Auto-Encoder , 2018, NeurIPS.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[14]  Jos'e Miguel Hern'andez-Lobato,et al.  Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding , 2020, NeurIPS.

[15]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[16]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[17]  Tim Salimans,et al.  IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression , 2021, ICLR.

[18]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[19]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[20]  Ryan P. Adams,et al.  High-Dimensional Probability Estimation with Deep Density Models , 2013, ArXiv.

[21]  S. Mandt,et al.  Improving Inference for Neural Image Compression , 2020, NeurIPS.

[22]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[23]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[24]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[25]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[26]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[27]  Emiel Hoogeboom,et al.  Integer Discrete Flows and Lossless Compression , 2019, NeurIPS.

[28]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[29]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[30]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[31]  David Barber,et al.  Practical Lossless Compression with Latent Variables using Bits Back Coding , 2019, ICLR.

[32]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[33]  Tuan Anh Le,et al.  Auto-Encoding Sequential Monte Carlo , 2017, ICLR.

[34]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[35]  Elad Eban,et al.  Computationally Efficient Neural Image Compression , 2019, ArXiv.

[36]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[37]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[38]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[39]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[40]  Casper Kaae Sønderby Continuous Relaxation Training of Discrete Latent Variable Image Models , 2017 .

[41]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[42]  David Duvenaud,et al.  Reinterpreting Importance-Weighted Autoencoders , 2017, ICLR.

[43]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[44]  David Barber,et al.  HiLLoC: Lossless Image Compression with Hierarchical Latent Variable Models , 2019, ICLR.

[45]  Axel Finke On extended state-space constructions for Monte Carlo methods , 2015 .

[46]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[47]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[48]  Scott W. Linderman,et al.  Variational Sequential Monte Carlo , 2017, AISTATS.

[49]  James Townsend A tutorial on the range variant of asymmetric numeral systems , 2020, ArXiv.

[50]  Pieter Abbeel,et al.  Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables , 2019, ICML.

[51]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[52]  Brendan J. Frey,et al.  Efficient Stochastic Source Coding and an Application to a Bayesian Network Source Model , 1997, Comput. J..

[53]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[54]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[55]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[56]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[57]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[58]  Jarek Duda,et al.  Asymmetric numeral systems , 2009, ArXiv.