Random Effects Models with Deep Neural Network Basis Functions: Methodology and Computation

Deep neural networks (DNNs) are a powerful tool for functional approximation. We describe flexible versions of generalized linear and generalized linear mixed models incorporating basis functions formed by a deep neural network. The consideration of neural networks with random effects seems little used in the literature, perhaps because of the computational challenges of incorporating subject specific parameters into already complex models. Efficient computational methods for Bayesian inference are developed based on Gaussian variational approximation methods. A parsimonious but flexible factor parametrization of the covariance matrix is used in the Gaussian variational approximation. We implement natural gradient methods for the optimization, exploiting the factor structure of the variational covariance matrix to perform fast matrix vector multiplications in iterative conjugate gradient linear solvers in natural gradient computations. The method can be implemented in high dimensions, and the use of the natural gradient allows faster and more stable convergence of the variational algorithm. In the case of random effects, we compute unbiased estimates of the gradient of the lower bound in the model with the random effects integrated out by making use of Fisher's identity. The proposed methods are illustrated in several examples for DNN random effects models and high-dimensional logistic regression with sparse signal shrinkage priors.

[1]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[2]  Vadim Sokolov,et al.  Deep Learning: A Bayesian Perspective , 2017, ArXiv.

[3]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[4]  Tze Leung Lai,et al.  A New Approach to Modeling Covariate Effects and Individualization in Population Pharmacokinetics-Pharmacodynamics , 2005, Journal of Pharmacokinetics and Pharmacodynamics.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  James T. Kwok,et al.  Fast Second Order Stochastic Backpropagation for Variational Inference , 2015, NIPS.

[7]  Robert Kohn,et al.  Block-Wise Pseudo-Marginal Metropolis-Hastings , 2016 .

[8]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[9]  Peter Rupert,et al.  Efficient estimation with panel data: An empirical comparison of instrumental variables estimators , 1988 .

[10]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[11]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[12]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[13]  David J. Nott,et al.  Gaussian variational approximation with sparse precision matrices , 2016, Statistics and Computing.

[14]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[15]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[16]  David B. Dunson,et al.  Variational Gaussian Copula Inference , 2015, AISTATS.

[17]  G. Casella Empirical Bayes Gibbs sampling. , 2001, Biostatistics.

[18]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[19]  Robert Kohn,et al.  Fast Inference for Intractable Likelihood Problems using Variational B ayes , 2016, 1705.06679.

[20]  Christopher C. Drovandi,et al.  Variational Bayes with synthetic likelihood , 2016, Statistics and Computing.

[21]  Richard E. Turner,et al.  Overpruning in Variational Bayesian Neural Networks , 2018, 1801.06230.

[22]  D. Nott,et al.  Gaussian Variational Approximation With a Factor Covariance Structure , 2017, Journal of Computational and Graphical Statistics.

[23]  Juha Karhunen,et al.  Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes , 2010, J. Mach. Learn. Res..

[24]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[25]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.

[26]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[28]  David J. Nott,et al.  Variational Bayes With Intractable Likelihood , 2015, 1503.08621.

[29]  Josef Stoer,et al.  Solution of Large Linear Systems of Equations by Conjugate Gradient Type Methods , 1982, ISMP.

[30]  H. Robbins A Stochastic Approximation Method , 1951 .

[31]  Michael I. Jordan,et al.  Fast Black-box Variational Inference through Stochastic Trust-Region Optimization , 2017, NIPS.

[32]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[33]  B. Baltagi,et al.  Econometric Analysis of Panel Data , 2020, Springer Texts in Business and Economics.

[34]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[35]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[36]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[37]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[38]  J. Geweke,et al.  Measuring the pricing error of the arbitrage pricing theory , 1996 .

[39]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[40]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[41]  Christopher C. Drovandi,et al.  Likelihood-free inference in high dimensions with synthetic likelihood , 2018, Comput. Stat. Data Anal..

[42]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[43]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[44]  Chong Wang,et al.  An Adaptive Learning Rate for Stochastic Variational Inference , 2013, ICML.

[45]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .