Conditionally structured variational Gaussian approximation with importance weights

We develop flexible methods of deriving variational inference for models with complex latent variable structure. By splitting the variables in these models into “global” parameters and “local” latent variables, we define a class of variational approximations that exploit this partitioning and go beyond Gaussian variational approximation. This approximation is motivated by the fact that in many hierarchical models, there are global variance parameters which determine the scale of local latent variables in their posterior conditional on the global parameters. We also consider parsimonious parametrizations by using conditional independence structure and improved estimation of the log marginal likelihood and variational density using importance weights. These methods are shown to improve significantly on Gaussian variational approximation methods for a similar computational cost. Application of the methodology is illustrated using generalized linear mixed models and state space models.

[1]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Ferenc Huszár,et al.  Variational Inference using Implicit Distributions , 2017, ArXiv.

[5]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[6]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[7]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[8]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[9]  Edoardo M. Airoldi,et al.  Copula variational inference , 2015, NIPS.

[10]  Andrew Phillips,et al.  Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems , 2019, ICML.

[11]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[12]  Ryan P. Adams,et al.  Variational Boosting: Iteratively Refining Posterior Approximations , 2016, ICML.

[13]  Xiangyu Wang,et al.  Boosting Variational Inference , 2016, ArXiv.

[14]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[15]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[16]  Il Memming Park,et al.  BLACK BOX VARIATIONAL INFERENCE FOR STATE SPACE MODELS , 2015, 1511.07367.

[17]  David J. Nott,et al.  Gaussian variational approximation with sparse precision matrices , 2016, Statistics and Computing.

[18]  A. Rényi On Measures of Entropy and Information , 1961 .

[19]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[20]  P. Thall,et al.  Some covariance models for longitudinal count data with overdispersion. , 1990, Biometrics.

[21]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[22]  N. Laird,et al.  A likelihood-based method for analysing longitudinal binary responses , 1993 .

[23]  Ricardo Silva,et al.  Alpha-Beta Divergence For Variational Inference , 2018, ArXiv.

[24]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[25]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[26]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[27]  David B. Dunson,et al.  Variational Gaussian Copula Inference , 2015, AISTATS.

[28]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[29]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[30]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[31]  Linda S. L. Tan,et al.  Model reparametrization for improving variational inference , 2018, 1805.07267.

[32]  Adam J. Rothman,et al.  A new approach to Cholesky-based covariance regularization in high dimensions , 2009, 0903.0645.

[33]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[34]  Gregor Kastner,et al.  Ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility models , 2014, Comput. Stat. Data Anal..

[35]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[36]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[37]  David J. Nott,et al.  High-Dimensional Copula Variational Approximation Through Transformation , 2019 .

[38]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[39]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[40]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[41]  David M. Blei,et al.  Stochastic Structured Variational Inference , 2014, AISTATS.

[42]  Jan R. Magnus,et al.  The Elimination Matrix: Some Lemmas and Applications , 1980, SIAM J. Algebraic Discret. Methods.

[43]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[44]  Gareth O. Roberts,et al.  A General Framework for the Parametrization of Hierarchical Models , 2007, 0708.3797.

[45]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[46]  Linda S. L. Tan,et al.  Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations , 2012, 1205.3906.

[47]  M. J. Bayarri,et al.  Non-Centered Parameterisations for Hierarchical Models and Data Augmentation , 2003 .

[48]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[49]  Debdeep Pati,et al.  $\alpha $-variational inference with statistical guarantees , 2017, The Annals of Statistics.

[50]  Gareth O. Roberts,et al.  Non-centred parameterisations for hierarchical models and data augmentation. , 2003 .

[51]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[52]  Youssef M. Marzouk,et al.  Inference via Low-Dimensional Couplings , 2017, J. Mach. Learn. Res..