Subsampling-Based Approximate Monte Carlo for Discrete Distributions

Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling also suffers from high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.

[1]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[2]  Xi Chen,et al.  Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.

[3]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[4]  Andrew McCallum,et al.  Monte Carlo MCMC: Efficient Inference by Approximate Sampling , 2012, EMNLP.

[5]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[6]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[7]  E. Paulson A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .

[8]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[9]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[10]  James R. Wilson Variance Reduction Techniques for Digital Simulation , 1984 .

[11]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[12]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[13]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[14]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[15]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[16]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[17]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[18]  Padhraic Smyth,et al.  Approximate Slice Sampling for Bayesian Posterior Inference , 2014, AISTATS.

[19]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[20]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[21]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[22]  Robert E. Bechhofer,et al.  A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs , 1958 .

[23]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[24]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[26]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[27]  Zoubin Ghahramani,et al.  Scaling the iHMM: Parallelization versus Hadoop , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[28]  Manfred K. Warmuth,et al.  Optimum Follow the Leader Algorithm , 2005, COLT.

[29]  Qing Yang,et al.  Efficient Multicore Collaborative Filtering , 2011, ArXiv.