Exact Rate-Distortion in Autoencoders via Echo Noise

Compression is at the heart of effective representation learning. However, lossy compression is typically achieved through simple parametric models like Gaussian noise to preserve analytic tractability, and the limitations this imposes on learning are largely unexplored. Further, the Gaussian prior assumptions in models such as variational autoencoders (VAEs) provide only an upper bound on the compression rate in general. We introduce a new noise channel, \emph{Echo noise}, that admits a simple, exact expression for mutual information for arbitrary input distributions. The noise is constructed in a data-driven fashion that does not require restrictive distributional assumptions. With its complex encoding mechanism and exact rate regularization, Echo leads to improved bounds on log-likelihood and dominates $\beta$-VAEs across the achievable range of rate-distortion trade-offs. Further, we show that Echo noise can outperform flow-based methods without the need to train additional distributional transformations.

[1]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[2]  Felix Agakov,et al.  Variational Information Maximization in Stochastic Environments , 2006 .

[3]  Rob Brekelmans,et al.  Disentangled representations via synergy minimization , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[5]  Farhan Abrol,et al.  Variational Tempering , 2016, AISTATS.

[6]  Karl Stratos,et al.  Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.

[7]  Sjoerd van Steenkiste,et al.  Are Disentangled Representations Helpful for Abstract Visual Reasoning? , 2019, NeurIPS.

[8]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[9]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[10]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[11]  Rob Brekelmans,et al.  Auto-Encoding Total Correlation Explanation , 2018, AISTATS.

[12]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[13]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[14]  Sebastian Nowozin,et al.  The Mutual Autoencoder: Controlling Information in Latent Code Representations , 2018 .

[15]  Kee-Eung Kim,et al.  Information-Theoretic Bounded Rationality , 2015, ArXiv.

[16]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[17]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Fabio Viola,et al.  Taming VAEs , 2018, ArXiv.

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  S. Ermon,et al.  The Information-Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Modeling , 2018 .

[23]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[24]  W. Bastiaan Kleijn,et al.  Bounded Information Rate Variational Autoencoders , 2018, ArXiv.

[25]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[26]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[27]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[29]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[30]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[31]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[32]  Luis Alfonso Lastras-Montaño Information Theoretic lower bounds on negative log likelihood , 2019, ICLR.

[33]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[34]  Rob Brekelmans,et al.  Invariant Representations without Adversarial Training , 2018, NeurIPS.

[35]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[36]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[37]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[38]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[39]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[40]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[41]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[42]  Olivier Bachem,et al.  Recent Advances in Autoencoder-Based Representation Learning , 2018, ArXiv.

[43]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[44]  Yee Whye Teh,et al.  Disentangling Disentanglement in Variational Autoencoders , 2018, ICML.

[45]  Christof Koch,et al.  Quantifying synergistic mutual information , 2012, ArXiv.

[46]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[47]  Shakir Mohamed,et al.  Distribution Matching in Variational Inference , 2018, ArXiv.

[48]  David H. Wolpert,et al.  Nonlinear Information Bottleneck , 2017, Entropy.