A Stochastic Variational Framework for Fitting and Diagnosing Generalized Linear Mixed Models

In stochastic variational inference, the variational Bayes objective function is optimized using stochastic gradient approximation, where gradients computed on small random subsets of data are used to approximate the true gradient over the whole data set. This enables complex models to be fit to large data sets as data can be processed in mini-batches. In this article, we extend stochastic variational inference for conjugate-exponential models to nonconjugate models and present a stochastic nonconjugate variational message passing algorithm for fitting generalized linear mixed models that is scalable to large data sets. In addition, we show that diagnostics for prior-likelihood conflict, which are useful for Bayesian model criticism, can be obtained from nonconjugate variational message passing automatically, as an alternative to simulation-based Markov chain Monte Carlo methods. Finally, we demonstrate that for moderate-sized data sets, convergence can be accelerated by using the stochastic version of nonconjugate variational message passing in the initial stage of optimization before switching to the standard version.

[1]  Matthew P. Wand,et al.  Fully simplified multivariate normal updates in non-conjugate variational message passing , 2014, J. Mach. Learn. Res..

[2]  P. Marriott,et al.  Diagnostics for Variational Bayes approximations , 2013, 1309.5117.

[3]  David J. Spiegelhalter,et al.  Conflict Diagnostics in Directed Acyclic Graphs, with Applications in Bayesian Evidence Synthesis , 2013, 1310.0628.

[4]  Efficient variational inference for generalized linear mixed models with large datasets , 2013, 1307.7963.

[5]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[6]  Chong Wang,et al.  An Adaptive Learning Rate for Stochastic Variational Inference , 2013, ICML.

[7]  F. Liang,et al.  A Resampling-Based Stochastic Approximation Method for Analysis of Large Geostatistical Data , 2013 .

[8]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[9]  Linda S. L. Tan,et al.  Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations , 2012, 1205.3906.

[10]  M. Wand,et al.  Simple Marginally Noninformative Prior Distributions for Covariance Matrices , 2013 .

[11]  M. Wand,et al.  Real-Time Semiparametric Regression , 2012, 1209.3550.

[12]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[13]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[14]  R. Kohn,et al.  Regression Density Estimation With Variational Methods and Stochastic Approximation , 2012 .

[15]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[16]  M. Wand,et al.  Gaussian Variational Approximate Inference for Generalized Linear Mixed Models , 2012 .

[17]  Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[18]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[19]  McCollinChristopher Applied stochastic models in business and industry , 2011 .

[20]  F. Vaida,et al.  Conditional Akaike information under generalized linear and proportional hazards mixed models. , 2011, Biometrika.

[21]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[22]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[23]  Ida Scheel,et al.  A Graphical Diagnostic for Identifying Influential Model Choices in Bayesian Hierarchical Models , 2010 .

[24]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[25]  Jonathan J. Forster,et al.  Default Bayesian model determination methods for generalised linear mixed models , 2010, Comput. Stat. Data Anal..

[26]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[27]  J. Kalbfleisch,et al.  Block-Conditional Missing at Random Models for Missing Data , 2010, 1104.2400.

[28]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[29]  Susan Groshen,et al.  Outlier detection for a hierarchical Bayes model in a study of hospital variation in surgical procedures , 2009, Statistical methods in medical research.

[30]  M. West,et al.  Bounded Approximations for Marginal Likelihoods , 2010 .

[31]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[32]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[33]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[34]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[35]  D. J. Spiegelhalter,et al.  Identifying outliers in Bayesian hierarchical models: a simulation-based approach , 2007 .

[36]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[37]  Gareth O. Roberts,et al.  A General Framework for the Parametrization of Hierarchical Models , 2007, 0708.3797.

[38]  H. Robbins A Stochastic Approximation Method , 1951 .

[39]  Juha Karhunen,et al.  Natural Conjugate Gradient in Variational Inference , 2007, ICONIP.

[40]  Wolfgang Jank,et al.  Implementing and Diagnosing the Stochastic Approximation EM Algorithm , 2006 .

[41]  Michael Evans,et al.  Checking for prior-data conflict , 2006 .

[42]  Robert E. Kass,et al.  A default conjugate prior for variance components in generalized linear mixed models (comment on article by Browne and Draper) , 2006 .

[43]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[44]  G. Molenberghs Applied Longitudinal Analysis , 2005 .

[45]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .

[46]  Andrew Gelman,et al.  R2WinBUGS: A Package for Running WinBUGS from R , 2005 .

[47]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[48]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[49]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[50]  Luca Tardella,et al.  A geometric approach to transdimensional markov chain monte carlo , 2003 .

[51]  David J. Spiegelhalter,et al.  Commissioned analysis of surgical performance using routine data: lessons from the Bristol inquiry , 2002 .

[52]  Sik-Yum Lee,et al.  Analysis of generalized linear mixed models via a stochastic approximation algorithm with Markov chain Monte-Carlo method , 2002, Stat. Comput..

[53]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[54]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[55]  S. Raudenbush,et al.  Maximum Likelihood for Generalized Linear Models with Nested Random Effects via High-Order, Multivariate Laplace Approximation , 2000 .

[56]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[57]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[58]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[59]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[60]  Geert Molenberghs,et al.  Likelihood Based Frequentist Inference When Data Are Missing at Random , 1998 .

[61]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[62]  Paul Tseng,et al.  An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[63]  A. Gelfand,et al.  Efficient parametrizations for generalized linear mixed models, (with discussion). , 1996 .

[64]  A. Gelfand,et al.  Efficient parametrisations for normal linear mixed models , 1995 .

[65]  P. Diggle Analysis of Longitudinal Data , 1995 .

[66]  W. Eaton,et al.  Ten‐year course of schizophrenia—the Madras longitudinal study , 1994, Acta psychiatrica Scandinavica.

[67]  N M Laird,et al.  Analysing incomplete longitudinal binary responses: a likelihood-based approach. , 1994, Biometrics.

[68]  Qing Liu,et al.  A note on Gauss—Hermite quadrature , 1994 .

[69]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[70]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[71]  Purushottam W. Laud,et al.  On Bayesian Analysis of Generalized Linear Models Using Jeffreys's Prior , 1991 .

[72]  P. Thall,et al.  Some covariance models for longitudinal count data with overdispersion. , 1990, Biometrics.

[73]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[74]  T. Stukel,et al.  The Skin Cancer Prevention Study: design of a clinical trial of beta-carotene among persons at high risk for nonmelanoma skin cancer. , 1989, Controlled clinical trials.

[75]  Robert F. Woolson,et al.  Analysis of categorical incomplete longitudinal data , 1984 .

[76]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .