Auto-Encoding Total Correlation Explanation

Advances in unsupervised learning enable reconstruction and generation of samples from complex distributions, but this success is marred by the inscrutability of the representations learned. We propose an information-theoretic approach to characterizing disentanglement and dependence in representation learning using multivariate mutual information, also called total correlation. The principle of total Cor-relation Ex-planation (CorEx) has motivated successful unsupervised learning applications across a variety of domains, but under some restrictive assumptions. Here we relax those restrictions by introducing a flexible variational lower bound to CorEx. Surprisingly, we find that this lower bound is equivalent to the one in variational autoencoders (VAE) under certain conditions. This information-theoretic view of VAE deepens our understanding of hierarchical VAE and motivates a new algorithm, AnchorVAE, that makes latent codes more interpretable through information maximization and enables generation of richer and more realistic samples.

[1]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[2]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[3]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[4]  Greg Ver Steeg,et al.  Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge , 2016, TACL.

[5]  Aram Galstyan,et al.  Maximally Informative Hierarchical Representations of High-Dimensional Data , 2014, AISTATS.

[6]  Aram Galstyan,et al.  Low Complexity Gaussian Latent Factor Models and a Blessing of Dimensionality , 2017, ArXiv.

[7]  Philippe Beaudoin,et al.  Independently Controllable Factors , 2017, ArXiv.

[8]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Stefano Soatto,et al.  Emergence of invariance and disentangling in deep representations , 2017 .

[10]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[11]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[12]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[13]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[14]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[15]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[16]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[17]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[18]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[19]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[20]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[21]  Joelle Pineau,et al.  Independently Controllable Features , 2017 .

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  M. Studený,et al.  The Multiinformation Function as a Tool for Measuring Stochastic Dependence , 1998, Learning in Graphical Models.

[24]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[25]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[26]  Greg Ver Steeg,et al.  Unsupervised Learning via Total Correlation Explanation , 2017, IJCAI.

[27]  Stefano Ermon,et al.  Learning Hierarchical Features from Generative Models , 2017, ArXiv.

[28]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[29]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[30]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[31]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[33]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[34]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  David D. Cox,et al.  On the information bottleneck theory of deep learning , 2018, ICLR.

[36]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[37]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[38]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .