Statistical Robustness of MCMC Accelerators

Statistical machine learning often uses probabilistic algorithms, such as Markov Chain Monte Carlo (MCMC), to solve a wide range of problems. Probabilistic computations, often considered too slow on conventional processors, can be accelerated with specialized hardware by exploiting parallelism and optimizing the design using various approximation techniques. Current methodologies for evaluating correctness of probabilistic accelerators are often incomplete, mostly focusing only on end-point result quality (“accuracy”). It is important for hardware designers and domain experts to look beyond end-point “accuracy” and be aware of how hardware optimizations impact statistical properties. This work takes a first step toward defining metrics and a methodology for quantitatively evaluating correctness of probabilistic accelerators. We propose three pillars of statistical robustness: 1) sampling quality, 2) convergence diagnostic, and 3) goodness of fit. We apply our framework to a representative MCMC accelerator and surface design issues that cannot be exposed using only application end-point result quality. We demonstrate the benefits of this framework to guide design space exploration in a case study showing that statistical robustness comparable to floating-point software can be achieved with limited precision, avoiding floating-point hardware overheads.

[1]  Xiangyu Zhang,et al.  Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators , 2020, ArXiv.

[2]  Ravishankar K. Iyer,et al.  AcMC 2: Accelerating Markov Chain Monte Carlo Algorithms for Probabilistic Models , 2019, ASPLOS.

[3]  Rob A. Rutenbar,et al.  FlexGibbs: Reconfigurable Parallel Gibbs Sampling Accelerator for Structured Graphs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[4]  Nathanael L. Ackerman,et al.  A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions , 2019, AISTATS.

[5]  James M. Flegal,et al.  Multivariate output analysis for Markov chain Monte Carlo , 2015, Biometrika.

[6]  Andrej Risteski,et al.  Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition , 2018, ArXiv.

[7]  Xiangyu Zhang,et al.  Architecting a Stochastic Computing Unit with Molecular Optical Devices , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[8]  Yan Zhang,et al.  Joint Lung CT Image Segmentation: A Hierarchical Bayesian Approach , 2016, PloS one.

[9]  Siyang Wang,et al.  Accelerating markov random field inference using molecular optical gibbs sampling units , 2016, CARN.

[10]  Christos-Savvas Bouganis,et al.  An exact MCMC accelerator under custom precision regimes , 2015, 2015 International Conference on Field Programmable Technology (FPT).

[11]  Jonathan C. Mattingly,et al.  Optimal approximating Markov chains for Bayesian inference , 2015, 1508.03387.

[12]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[13]  Mayler G. A. Martins,et al.  Open Cell Library in 15nm FreePDK Technology , 2015, ISPD.

[14]  Vikash K. Mansinghka,et al.  Building fast Bayesian computing machines out of intentionally stochastic, digital parts , 2014, ArXiv.

[15]  Madeleine B. Thompson A Comparison of Methods for Computing Autocorrelation Time , 2010, 1011.0175.

[16]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[18]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[19]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[20]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[21]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.