GPU-accelerated Gibbs sampling: a case study of the Horseshoe Probit model

Gibbs sampling is a widely used Markov chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. Many implementations of MCMC methods do not extend easily to parallel computing environments, as their inherently sequential nature incurs a large synchronization cost. In the case study illustrated by this paper, we show how to do Gibbs sampling in a fully data-parallel manner on a graphics processing unit, for a large class of exchangeable models that admit latent variable representations. Our approach takes a systems perspective, with emphasis placed on efficient use of compute hardware. We demonstrate our method on a Horseshoe Probit regression model and find that our implementation scales effectively to thousands of predictors and millions of data points simultaneously.

[1]  Norm Matloff Programming on Parallel Machines: GPU, Multicore, Clusters and More , 2012 .

[2]  Adrian Baddeley,et al.  Systematic sampling with errors in sample locations , 2010 .

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Daniel Simpson,et al.  Asynchronous Gibbs Sampling , 2015, AISTATS.

[5]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[6]  Anthony Lee,et al.  Parallel Resampling in the Particle Filter , 2013, 1301.4019.

[7]  John F. Canny,et al.  Fast Parallel SAME Gibbs Sampling on General Discrete Bayesian Networks , 2015, ArXiv.

[8]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[9]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[10]  Mark A. Moraes,et al.  Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  C. Robert Simulation of truncated normal variables , 2009, 0907.4010.

[12]  Feng Yan,et al.  Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[13]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[14]  Graham Neubig,et al.  On-the-fly Operation Batching in Dynamic Computation Graphs , 2017, NIPS.

[15]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[16]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[17]  Pierre L'Ecuyer,et al.  TestU01: A C library for empirical testing of random number generators , 2006, TOMS.

[18]  Jon Doyle,et al.  Fast Hamiltonian Monte Carlo Using GPU Computing , 2014, 1402.4089.

[19]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[20]  Alexander K. Hartmann,et al.  Random number generators for massively parallel simulations on GPU , 2012, The European Physical Journal Special Topics.

[21]  Enes Makalic,et al.  A Simple Sampler for the Horseshoe Estimator , 2015, IEEE Signal Processing Letters.

[22]  David Draper,et al.  Pólya Urn Latent Dirichlet Allocation: A Doubly Sparse Massively Parallel Sampler , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Stefan Wager,et al.  Teaching Statistics at Google-Scale , 2015 .

[24]  Gareth O. Roberts,et al.  A note on geometric ergodicity and floating-point roundoff error , 2001 .

[25]  Christos-Savvas Bouganis,et al.  Parallel Tempering MCMC Acceleration Using Reconfigurable Hardware , 2012, ARC.

[26]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[27]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[29]  Jason Wittenberg,et al.  Clarify: Software for Interpreting and Presenting Statistical Results , 2003 .

[30]  D. Draper,et al.  Asynchronous Distributed Gibbs Sampling , 2015 .

[31]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .

[32]  John Canny,et al.  BIDMach: Large-scale Learning with Zero Memory Allocation , 2013 .

[33]  David Broman,et al.  Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models , 2015, 1506.03784.

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[36]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[37]  R. Cheng,et al.  The Generation of Gamma Variables with Non‐Integral Shape Parameter , 1977 .

[38]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[39]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[40]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[41]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[42]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[43]  B. D. Finetti La prévision : ses lois logiques, ses sources subjectives , 1937 .

[44]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.