Efficient posterior sampling for high-dimensional imbalanced logistic regression.

Classification with high-dimensional data is of widespread interest and often involves dealing with imbalanced data. Bayesian classification approaches are hampered by the fact that current Markov chain Monte Carlo algorithms for posterior computation become inefficient as the number [Formula: see text] of predictors or the number [Formula: see text] of subjects to classify gets large, because of the increasing computational time per step and worsening mixing rates. One strategy is to employ a gradient-based sampler to improve mixing while using data subsamples to reduce the per-step computational complexity. However, the usual subsampling breaks down when applied to imbalanced data. Instead, we generalize piecewise-deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch subsampling. These maintain the correct stationary distribution with arbitrarily small subsamples and substantially outperform current competitors. We provide theoretical support for the proposed approach and demonstrate its performance gains in simulated data examples and an application to cancer data.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  D. Dunson,et al.  Simple, scalable and accurate posterior interval estimation , 2016, 1605.04029.

[4]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[5]  Jaime S. Cardoso,et al.  Transfer Learning with Partial Observability Applied to Cervical Cancer Screening , 2017, IbPRIA.

[6]  Aaron Smith,et al.  MCMC for Imbalanced Categorical Data , 2016, Journal of the American Statistical Association.

[7]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[8]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[9]  Gabriel Stoltz,et al.  Partial differential equations and stochastic methods in molecular dynamics* , 2016, Acta Numerica.

[10]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[11]  Tong Zhang,et al.  Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling , 2014, ArXiv.

[12]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[13]  Jonathan C. Mattingly,et al.  Optimal approximating Markov chains for Bayesian inference , 2015, 1508.03387.

[14]  A. Duncan,et al.  Limit theorems for the zig-zag process , 2016, Advances in Applied Probability.

[15]  Christophe Andrieu,et al.  Hypocoercivity of piecewise deterministic Markov process-Monte Carlo , 2018, The Annals of Applied Probability.

[16]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[17]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[18]  G. Shedler,et al.  Simulation of Nonhomogeneous Poisson Processes by Thinning , 1979 .

[19]  Andrew Golightly,et al.  Adaptive, Delayed-Acceptance MCMC for Targets With Expensive Likelihoods , 2015, 1509.00172.

[20]  Andrew Golightly,et al.  Delayed acceptance particle MCMC for exact inference in stochastic kinetic models , 2014, Stat. Comput..

[21]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[22]  Paul Fearnhead,et al.  Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo , 2016, Statistical Science.

[23]  Trevor Hastie,et al.  LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS. , 2013, Annals of statistics.

[24]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[25]  Eric Brochu,et al.  Optimal Sub-sampling with Influence Functions , 2017, NeurIPS.

[26]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[27]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[28]  A. Doucet,et al.  Piecewise-Deterministic Markov Chain Monte Carlo , 2017, 1707.05296.

[29]  R. Bhattacharya On the functional central limit theorem and the law of the iterated logarithm for Markov processes , 1982 .

[30]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[31]  G. Roberts,et al.  Ergodicity of the zigzag process , 2017, The Annals of Applied Probability.

[32]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[33]  Xiangyu Wang,et al.  Parallelizing MCMC with Random Partition Trees , 2015, NIPS.

[34]  A. Doucet,et al.  The Bouncy Particle Sampler: A Nonreversible Rejection-Free Markov Chain Monte Carlo Method , 2015, 1510.02451.

[35]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[36]  S. Mukherjee,et al.  Approximations of Markov Chains and High-Dimensional Bayesian Inference , 2015 .

[37]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.