Hierarchical Variational Models

Black box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation? To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[3]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[4]  P. Dayan Helmholtz Machines and Wake-Sleep Learning , 2000 .

[5]  David M. Blei,et al.  Stochastic Structured Variational Inference , 2014, AISTATS.

[6]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[7]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[10]  M. Titsias Local Expectation Gradients for Doubly Stochastic Variational Inference , 2015, 1503.01494.

[11]  Edoardo M. Airoldi,et al.  Variational inference with copula augmentation , 2015, ArXiv.

[12]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[13]  Edoardo M. Airoldi,et al.  Copula variational inference , 2015, NIPS.

[14]  Lawrence D. Brown Fundamentals of Statistical Exponential Families , 1987 .

[15]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[16]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[17]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[18]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[19]  Noah D. Goodman,et al.  Learning Stochastic Inverses , 2013, NIPS.

[20]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[21]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[22]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[23]  Frank Sehnke,et al.  Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.

[24]  Daan Wierstra,et al.  Stochastic Back-propagation and Variational Inference in Deep Latent Gaussian Models , 2014, ArXiv.

[25]  Michael I. Jordan,et al.  Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes , 2015, NIPS.

[26]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[27]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[28]  T. Jaakkola,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[29]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[30]  Miguel Lázaro-Gredilla,et al.  Local Expectation Gradients for Black Box Variational Inference , 2015, NIPS.

[31]  H. Robbins A Stochastic Approximation Method , 1951 .

[32]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[33]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[34]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[35]  Zoubin Ghahramani Gatsby On Structured Variational Approximations , 1997 .

[36]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[37]  David Barber,et al.  An Auxiliary Variational Method , 2004, ICONIP.

[38]  David M. Blei,et al.  Structured Stochastic Variational Inference , 2014, 1404.4114.

[39]  David M. Blei,et al.  Deep Exponential Families , 2014, AISTATS.

[40]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[41]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[42]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43]  H. Robbins The Empirical Bayes Approach to Statistical Decision Problems , 1964 .

[44]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[45]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[46]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[47]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[48]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[49]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[50]  Bradley Efron,et al.  Large-scale inference , 2010 .

[51]  B. Efron,et al.  Combining Possibly Related Estimation Problems , 1973 .

[52]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[53]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[54]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.