MC2RAM: Markov Chain Monte Carlo Sampling in SRAM for Fast Bayesian Inference

This work discusses the implementation of Markov Chain Monte Carlo (MCMC) sampling from an arbitrary Gaussian mixture model (GMM) within SRAM. We show a novel architecture of SRAM by embedding it with random number generators (RNGs), digital-to-analog converters (DACs), and analog-to-digital converters (ADCs) so that SRAM arrays can be used for high performance Metropolis-Hastings (MH) algorithm-based MCMC sampling. Most of the expensive computations are performed within the SRAM and can be parallelized for high speed sampling. Our iterative compute flow minimizes data movement during sampling. We characterize power-performance trade-off of our design by simulating on 45 nm CMOS technology. For a two-dimensional, two mixture GMM, the implementation consumes ~ 91 micro-Watts power per sampling iteration and produces 500 samples in 2000 clock cycles on an average at 1 GHz clock frequency. Our study highlights interesting insights on how low-level hardware non-idealities can affect high-level sampling characteristics, and recommends ways to optimally operate SRAM within area/power constraints for high performance sampling.

[1]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[2]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[3]  Massoud Pedram,et al.  VIBNN: Hardware Acceleration of Bayesian Neural Networks , 2018, ASPLOS.

[4]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[5]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[6]  Ravishankar K. Iyer,et al.  AcMC 2: Accelerating Markov Chain Monte Carlo Algorithms for Probabilistic Models , 2019, ASPLOS.

[7]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[8]  Himanshu Kaul,et al.  2.4 Gbps, 7 mW All-Digital PVT-Variation Tolerant True Random Number Generator for 45 nm CMOS High-Performance Microprocessors , 2012, IEEE Journal of Solid-State Circuits.

[9]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[10]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[11]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[12]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[13]  Franco Maloberti,et al.  Low-power 6-bit flash ADC for high-speed data converters architectures , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[14]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.