Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.

[1]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[2]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[3]  Robert E. Bechhofer,et al.  A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs , 1958 .

[4]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[5]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[6]  James R. Wilson Variance Reduction Techniques for Digital Simulation , 1984 .

[7]  Andrew McCallum,et al.  Monte Carlo MCMC: Efficient Inference by Approximate Sampling , 2012, EMNLP.

[8]  E. Paulson A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .

[9]  Yee Whye Teh,et al.  Distributed Bayesian Posterior Sampling via Moment Sharing , 2014, NIPS.

[10]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[11]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[12]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[13]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[14]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[15]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[16]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[17]  Manfred K. Warmuth,et al.  Optimum Follow the Leader Algorithm , 2005, COLT.

[18]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[19]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[20]  Zoubin Ghahramani,et al.  Scaling the iHMM: Parallelization versus Hadoop , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[21]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[22]  Dorota Glowacka,et al.  SwiftLink: parallel MCMC linkage analysis using multicore CPU and GPU , 2013, Bioinform..

[23]  Qing Yang,et al.  Efficient Multicore Collaborative Filtering , 2011, ArXiv.

[24]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[25]  Xi Chen,et al.  Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.

[26]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[27]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[28]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[29]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[30]  Ryan Babbush,et al.  Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[31]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[32]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[33]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[34]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[35]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[36]  Ahn Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC , 2015 .

[37]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[38]  Padhraic Smyth,et al.  Approximate Slice Sampling for Bayesian Posterior Inference , 2014, AISTATS.