AcMC 2: Accelerating Markov Chain Monte Carlo Algorithms for Probabilistic Models

Probabilistic models (PMs) are ubiquitously used across a variety of machine learning applications. They have been shown to successfully integrate structural prior information about data and effectively quantify uncertainty to enable the development of more powerful, interpretable, and efficient learning algorithms. This paper presents AcMC2, a compiler that transforms PMs into optimized hardware accelerators (for use in FPGAs or ASICs) that utilize Markov chain Monte Carlo methods to infer and query a distribution of posterior samples from the model. The compiler analyzes statistical dependencies in the PM to drive several optimizations to maximally exploit the parallelism and data locality available in the problem. We demonstrate the use of AcMC2 to implement several learning and inference tasks on a Xilinx Virtex-7 FPGA. AcMC2-generated accelerators provide a 47-100× improvement in runtime performance over a 6-core IBM Power8 CPU and a 8-18× improvement over an NVIDIA K80 GPU. This corresponds to a 753-1600× improvement over the CPU and 248-463× over the GPU in performance-per-watt terms.

[1]  Jüri Lember,et al.  Bridging Viterbi and posterior decoding: a generalized risk approach to hidden path inference based on hidden Markov models , 2014, J. Mach. Learn. Res..

[2]  Rob A. Rutenbar,et al.  Hardware implementation of MRF map inference on an FPGA platform , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[3]  John Wawrzynek,et al.  ParaLearn: a massively parallel, scalable system for learning interaction networks on FPGAs , 2010, ICS '10.

[4]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Ronan Keryell,et al.  Optimizing OpenCL applications on Xilinx FPGA , 2016, IWOCL.

[6]  Kunle Olukotun,et al.  Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[7]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[8]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[9]  A. J. Walker New fast method for generating discrete random numbers with arbitrary frequency distributions , 1974 .

[10]  Lei Li,et al.  Swift: Compiled Inference for Probabilistic Programming Languages , 2016, IJCAI.

[11]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[12]  L. Dixon,et al.  Automatic differentiation of algorithms , 2000 .

[13]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[14]  Giovanni De Micheli,et al.  High Level Synthesis of ASlCs un - der Timing and Synchronization Constraints , 1992 .

[15]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[16]  Christian P. Robert,et al.  Monte Carlo Methods , 2016 .

[17]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[18]  David Draper,et al.  GPU-accelerated Gibbs sampling: a case study of the Horseshoe Probit model , 2016, Stat. Comput..

[19]  Eddie Kohler,et al.  Accelerating MCMC via Parallel Predictive Prefetching , 2014, UAI.

[20]  Arthur Gretton,et al.  Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees , 2011, AISTATS.

[21]  Dustin Tran,et al.  Deep Probabilistic Programming , 2017, ICLR.

[22]  Joseph R. Cavallaro,et al.  Semi-parallel reconfigurable architectures for real-time LDPC decoding , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[23]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[24]  Ravishankar K. Iyer,et al.  On accelerating pair-HMM computations in programmable hardware , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[25]  John Wawrzynek,et al.  High-throughput bayesian computing machine with reconfigurable hardware , 2010, FPGA '10.

[26]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[27]  Neil Gershenfeld,et al.  Continuous-time analog circuits for statistical signal processing , 2003 .

[28]  G. Casella,et al.  Generalized Accept-Reject sampling schemes , 2004 .

[29]  Ingvar Strid Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach , 2010, Comput. Stat. Data Anal..

[30]  Christos-Savvas Bouganis,et al.  Particle MCMC algorithms and architectures for accelerating inference in state-space models☆ , 2017, Int. J. Approx. Reason..

[31]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[32]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[33]  Benjamin Vigoda,et al.  Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Processing , 2003 .

[34]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[35]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[36]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[37]  Stuart J. Russell,et al.  BLOG: Relational Modeling with Unknown Objects , 2004 .

[38]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[39]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[40]  Tinoosh Mohsenin,et al.  A Scalable FPGA-Based Accelerator for High-Throughput MCMC Algorithms , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[41]  Kunle Olukotun,et al.  Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[42]  Bill Bradley,et al.  Accelerating Inference: towards a full Language, Compiler and Hardware stack , 2012, ArXiv.

[43]  J. Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[44]  Wayne Luk,et al.  FPGA-Optimised Uniform Random Number Generators Using LUTs and Shift Registers , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[45]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[46]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[47]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[48]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[49]  Deming Chen,et al.  Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling , 2017, FPGA.

[50]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[51]  Walter L. Smith Probability and Statistics , 1959, Nature.

[52]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[53]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[54]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[55]  David M. Ceperley,et al.  Accelerating Quantum Monte Carlo Simulations of Real Materials on GPU Clusters , 2011, Computing in Science & Engineering.

[56]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[57]  Christopher De Sa,et al.  Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling , 2016, ICML.

[58]  Gary Smith,et al.  High-Level Synthesis: Past, Present, and Future , 2009, IEEE Design & Test of Computers.

[59]  J. Gregory Morrisett,et al.  Compiling Markov chain Monte Carlo algorithms for probabilistic modeling , 2017, PLDI.

[60]  Ravishankar K. Iyer,et al.  EEG-GRAPH: A Factor-Graph-Based Model for Capturing Spatial, Temporal, and Observational Relationships in Electroencephalograms , 2017, NIPS.

[61]  Ravishankar K. Iyer,et al.  Preemptive intrusion detection: theoretical framework and real-world measurements , 2015, HotSoS.

[62]  Brendan J. Frey,et al.  Extending Factor Graphs so as to Unify Directed and Undirected Graphical Models , 2002, UAI.

[63]  Massoud Pedram,et al.  VIBNN: Hardware Acceleration of Bayesian Neural Networks , 2018, ASPLOS.

[64]  Jörg H. Kappes,et al.  OpenGM: A C++ Library for Discrete Graphical Models , 2012, ArXiv.

[65]  B. Leimkuhler,et al.  Simulating Hamiltonian Dynamics: Hamiltonian PDEs , 2005 .

[66]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..