LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational-statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[3]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[4]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[5]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[6]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[7]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[8]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[9]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[10]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[11]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[16]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[17]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[18]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[19]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[20]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[25]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[26]  S. Glotzer,et al.  Time-course gait analysis of hemiparkinsonian rats following 6-hydroxydopamine lesion , 2004, Behavioural Brain Research.

[27]  David B. Dunson,et al.  Bayesian Compressed Regression , 2013, ArXiv.

[28]  Hee-Seok Oh,et al.  Bayesian regression based on principal components for high-dimensional data , 2013, J. Multivar. Anal..

[29]  Rong Jin,et al.  Random Projections for Classification: A Recovery Approach , 2014, IEEE Transactions on Information Theory.

[30]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[31]  Martin J. Wainwright,et al.  On the Computational Complexity of High-Dimensional Bayesian Variable Selection , 2015, ArXiv.

[32]  Tiangang Cui,et al.  Optimal Low-rank Approximations of Bayesian Linear Inverse Problems , 2014, SIAM J. Sci. Comput..

[33]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[34]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[35]  Aki Vehtari,et al.  Sparsity information and regularization in the horseshoe and other shrinkage priors , 2017, 1707.01694.

[36]  Ryan P. Adams,et al.  PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference , 2017, NIPS.

[37]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[38]  Anirban Bhattacharya,et al.  Scalable MCMC for Bayes Shrinkage Priors , 2017 .

[39]  Mladen Kolar,et al.  Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data , 2016, AISTATS.

[40]  Scalable MCMC for Bayes Shrinkage Priors , 2017 .

[41]  Christian Sohler,et al.  Random projections for Bayesian regression , 2015, Statistics and Computing.

[42]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[43]  Trevor Campbell,et al.  Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach , 2018, ArXiv.

[44]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[45]  Madeleine Udell,et al.  Why Are Big Data Matrices Approximately Low Rank? , 2017, SIAM J. Math. Data Sci..

[46]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..