Learning Structured Latent Factors from Dependent Data: A Generative Model Framework from Information-Theoretic Perspective

Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extend the theory of multivariate information bottleneck to enforce it. Our model provides a principled approach to learn a set of semantically meaningful latent factors that reflect various types of desired structures like capturing correlation or encoding invariance, while also offering the flexibility to automatically estimate the dependency structure from data. We show that our framework unifies many existing generative models and can be applied to a variety of tasks including multi-modal data modeling, algorithmic fairness, and invariant risk minimization.

[1]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Masahiro Suzuki,et al.  Joint Multimodal Learning with Deep Generative Models , 2016, ICLR.

[4]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[5]  Toniann Pitassi,et al.  Flexibly Fair Representation Learning by Disentanglement , 2019, ICML.

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Frank Nielsen On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means , 2019, Entropy.

[8]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[9]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[10]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[11]  Zoubin Ghahramani,et al.  Lost Relatives of the Gumbel Trick , 2017, ICML.

[12]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[13]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[14]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[15]  Dana H. Brooks,et al.  Structured Disentangled Representations , 2018, AISTATS.

[16]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[17]  Max Welling,et al.  DIVA: Domain Invariant Variational Autoencoder , 2019, DGS@ICLR.

[18]  Mike Wu,et al.  Multimodal Generative Models for Scalable Weakly-Supervised Learning , 2018, NeurIPS.

[19]  Nir Friedman,et al.  Learning Hidden Variable Networks: The Information Bottleneck Approach , 2005, J. Mach. Learn. Res..

[20]  Jungwon Lee,et al.  Wyner VAE: A Variational Autoencoder with Succinct Common Representation Learning , 2019 .

[21]  Aram Galstyan,et al.  The Information Sieve , 2015, ICML.

[22]  Kevin Murphy,et al.  Generative Models of Visually Grounded Imagination , 2017, ICLR.

[23]  Honglak Lee,et al.  Deep Variational Canonical Correlation Analysis , 2016, ArXiv.

[24]  Aapo Hyvärinen,et al.  Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning , 2018, AISTATS.

[25]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[26]  Aram Galstyan,et al.  Maximally Informative Hierarchical Representations of High-Dimensional Data , 2014, AISTATS.

[27]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[28]  Stefano Ermon,et al.  Learning Controllable Fair Representations , 2018, AISTATS.

[29]  Yoshua Bengio,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[30]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[31]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[32]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[33]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[34]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[35]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[36]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[37]  Stefano Ermon,et al.  A Lagrangian Perspective on Latent Variable Generative Models , 2018, UAI.

[38]  Yee Whye Teh,et al.  Disentangling Disentanglement in Variational Autoencoders , 2018, ICML.

[39]  Naftali Tishby,et al.  Multivariate Information Bottleneck , 2001, Neural Computation.

[40]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[41]  Noureddine El Karoui,et al.  Fairness-Aware Learning for Continuous Attributes and Treatments , 2019, ICML.

[42]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[43]  Rob Brekelmans,et al.  Auto-Encoding Total Correlation Explanation , 2018, AISTATS.