论文信息 - Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.

Zoubin Ghahramani | Yutian Chen | Zoubin Ghahramani | Yutian Chen

[1] Ahn. Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[2] N. Pillai,et al. Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[3] Robert E. Bechhofer,et al. A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs , 1958 .

[4] Max Welling,et al. Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[5] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[6] James R. Wilson. Variance Reduction Techniques for Digital Simulation , 1984 .

[7] Andrew McCallum,et al. Monte Carlo MCMC: Efficient Inference by Approximate Sampling , 2012, EMNLP.

[8] E. Paulson. A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .

[9] Yee Whye Teh,et al. Distributed Bayesian Posterior Sampling via Moment Sharing , 2014, NIPS.

[10] R. Serfling. Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[11] B. Carlin,et al. Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[12] Arnaud Doucet,et al. Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[13] Andrew W. Moore,et al. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[14] Edward I. George,et al. Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[15] Breck Baldwin,et al. Algorithms for Scoring Coreference Chains , 1998 .

[16] Odalric-Ambrym Maillard,et al. Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[17] Manfred K. Warmuth,et al. Optimum Follow the Leader Algorithm , 2005, COLT.

[18] George Papandreou,et al. Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[19] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[20] Zoubin Ghahramani,et al. Scaling the iHMM: Parallelization versus Hadoop , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[21] Max Welling,et al. Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[22] Dorota Glowacka,et al. SwiftLink: parallel MCMC linkage analysis using multicore CPU and GPU , 2013, Bioinform..

[23] Qing Yang,et al. Efficient Multicore Collaborative Filtering , 2011, ArXiv.

[24] Ahn,et al. Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[25] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.

[26] Alexander J. Smola,et al. Reducing the sampling complexity of topic models , 2014, KDD.

[27] A. Y. Mitrophanov,et al. Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[28] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[29] Stephen G. Walker,et al. Slice sampling mixture models , 2011, Stat. Comput..

[30] Ryan Babbush,et al. Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[31] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[32] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[33] Arnaud Doucet,et al. On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[34] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[35] Ryan P. Adams,et al. Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[36] Ahn. Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC , 2015 .

[37] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[38] Padhraic Smyth,et al. Approximate Slice Sampling for Bayesian Posterior Inference , 2014, AISTATS.