Scalable and Accurate Variational Bayes for High-Dimensional Binary Regression Models

State-of-the-art methods for Bayesian inference on regression models with binary responses are either computationally impractical or inaccurate in high dimensions. To cover this gap we propose a novel variational approximation for the posterior distribution of the coefficients in high-dimensional probit regression with Gaussian priors. Our method leverages a representation with global and local variables but, unlike for classical mean-field assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables. We prove that the resulting variational approximation belongs to a tractable class of unified skew-normal distributions that preserves the skewness of the actual posterior and, unlike for state-of-the-art variational Bayes solutions, converges to the exact posterior as the number of predictors p increases. A scalable coordinate ascent variational algorithm is proposed to obtain the optimal parameters of the approximating densities. As shown in theoretical studies and with an application to Alzheimer's data, this routine requires a number of iterations converging to one as p diverges to infinity, and can easily scale to large p settings where expectation-propagation and state-of-the-art Markov chain Monte Carlo algorithms are computationally impractical.

[1]  C. Neves Categorical data analysis, third edition , 2014 .

[2]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[3]  Thiago G. Martins,et al.  Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors , 2014, 1403.4630.

[4]  Max A. Little,et al.  Objective Automatic Assessment of Rehabilitative Speech Treatment in Parkinson's Disease , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[5]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[6]  Nicolas Chopin,et al.  Fast simulation of truncated Gaussian distributions , 2011, Stat. Comput..

[7]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[8]  Daniele Durante,et al.  Conjugate Bayes for probit regression via unified skew-normal distributions , 2018, Biometrika.

[9]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[10]  Mohammad Emtiyaz Khan,et al.  Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models , 2011, ICML.

[11]  Aki Vehtari,et al.  Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood , 2012, 1207.3649.

[12]  James P. Hobert,et al.  Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression , 2017, The Annals of Statistics.

[13]  A. Fasano,et al.  A Class of Conjugate Priors for Multinomial Probit Models which Includes the Multivariate Normal One , 2020, J. Mach. Learn. Res..

[14]  Daniele Durante,et al.  Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models , 2017, Statistical Science.

[15]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[16]  Ari Pakman,et al.  Exact Hamiltonian Monte Carlo for Truncated Multivariate Gaussians , 2012, 1208.4118.

[17]  William C. Horrace,et al.  Some results on the multivariate truncated normal distribution , 2005 .

[18]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[19]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[20]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[21]  A. Genz Numerical Computation of Multivariate Normal Probabilities , 1992 .

[22]  Adelchi Azzalini,et al.  The Skew-Normal and Related Families , 2018 .

[23]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[24]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[25]  Aysegul Gunduz,et al.  A comparative analysis of speech signal processing algorithms for Parkinson's disease classification and the use of the tunable Q-factor wavelet transform , 2019, Appl. Soft Comput..

[26]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[27]  Christian P. Robert,et al.  Bayesian computation: a summary of the current state, and samples backwards and forwards , 2015, Statistics and Computing.

[28]  S. Haberman,et al.  The analysis of frequency data , 1974 .

[29]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[30]  Sylvia Frühwirth-Schnatter,et al.  Auxiliary mixture sampling with applications to logistic models , 2007, Comput. Stat. Data Anal..

[31]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[32]  Daniel Pizarro-Perez,et al.  Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy , 2016, IEEE Transactions on Medical Imaging.

[33]  Aaron Smith,et al.  MCMC for Imbalanced Categorical Data , 2016, Journal of the American Statistical Association.

[34]  David B Dunson,et al.  Nonparametric Bayesian models through probit stick-breaking processes. , 2011, Bayesian analysis.

[35]  Haavard Rue,et al.  Intuitive Joint Priors for Variance Parameters , 2019, Bayesian Analysis.

[36]  Marc A. Suchard,et al.  Prior-preconditioned conjugate gradient for accelerated Gibbs sampling in "large n & large p" sparse Bayesian logistic regression models , 2018 .

[37]  Martina Vandebroek,et al.  A comparison of variational approximations for fast inference in mixed logit models , 2015, Comput. Stat..

[38]  Markus Reiss,et al.  Asymptotic equivalence for nonparametric regression with multivariate and random design , 2006, math/0607342.

[39]  Cun-Hui Zhang,et al.  Asymptotic equivalence theory for nonparametric regression with random design , 2002 .

[40]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[41]  Z. Botev The normal law under linear restrictions: simulation and estimation via minimax tilting , 2016, 1603.04166.

[42]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[43]  Chengjie Xiong,et al.  Multiplexed Immunoassay Panel Identifies Novel CSF Biomarkers for Alzheimer's Disease Diagnosis and Prognosis , 2011, PloS one.

[44]  Jean-Michel Marin,et al.  Mean-field variational approximate Bayesian inference for latent variable models , 2007, Comput. Stat. Data Anal..

[45]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .