Discrete Sampling using Semigradient-based Product Mixtures

We consider the problem of inference in discrete probabilistic models, that is, distributions over subsets of a finite ground set. These encompass a range of well-known models in machine learning, such as determinantal point processes and Ising models. Locally-moving Markov chain Monte Carlo algorithms, such as the Gibbs sampler, are commonly used for inference in such models, but their convergence is, at times, prohibitively slow. This is often caused by state-space bottlenecks that greatly hinder the movement of such samplers. We propose a novel sampling strategy that uses a specific mixture of product distributions to propose global moves and, thus, accelerate convergence. Furthermore, we show how to construct such a mixture using semigradient information. We illustrate the effectiveness of combining our sampler with existing ones, both theoretically on an example model, as well as practically on three models learned from real-world data sets.

[1]  Andreas Krause,et al.  From MAP to Marginals: Variational Inference in Bayesian Submodular Models , 2014, NIPS.

[2]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[3]  Andreas Krause,et al.  Learning Probabilistic Submodular Diversity Models Via Noise Contrastive Estimation , 2016, AISTATS.

[4]  Stefano Ermon,et al.  A Hybrid Approach for Probabilistic Inference using Random Projections , 2015, ICML.

[5]  Y. Peres,et al.  Glauber dynamics for the mean-field Ising model: cut-off, critical power law, and metastability , 2007, 0712.0790.

[6]  Nima Anari,et al.  Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes , 2016, COLT.

[7]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[8]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[9]  Amin Karbasi,et al.  Fast Mixing for Discrete Point Processes , 2015, COLT.

[10]  Suvrit Sra,et al.  Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling , 2016, NIPS.

[11]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[12]  Cheng Zhang,et al.  Probabilistic Path Hamiltonian Monte Carlo , 2017, ICML.

[13]  Rishabh K. Iyer,et al.  Submodular-Bregman and the Lovász-Bregman Divergences with Applications , 2012, NIPS.

[14]  Eric Vigoda,et al.  Elementary bounds on Poincaré and log-Sobolev constants for decomposable Markov chains , 2004, math/0503537.

[15]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[16]  Elchanan Mossel,et al.  Glauber dynamics on trees and hyperbolic graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[17]  Subhransu Maji,et al.  On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations , 2013, NIPS.

[18]  Andreas Krause,et al.  Variational Inference in Mixed Probabilistic Submodular Models , 2016, NIPS.

[19]  Yichuan Zhang,et al.  Continuous Relaxations for Discrete Hamiltonian Monte Carlo , 2012, NIPS.

[20]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[21]  D. Dunson,et al.  Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods , 2017, 1705.08510.

[22]  Jian Ding,et al.  Censored Glauber Dynamics for the Mean Field Ising Model , 2008, 0812.0633.

[23]  P. Diaconis,et al.  Comparison Techniques for Random Walk on Finite Groups , 1993 .

[24]  Max Welling,et al.  Distributed and Adaptive Darting Monte Carlo through Regenerations , 2013, AISTATS.

[25]  Liam Paninski,et al.  Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions , 2013, NIPS.

[26]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[27]  Jeff A. Bilmes,et al.  Submodularity beyond submodular energies: Coupling edges in graph cuts , 2011, CVPR 2011.

[28]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[29]  Andreas Krause,et al.  Cooperative Graphical Models , 2016, NIPS.

[30]  Alkis Gotovos,et al.  Sampling from Probabilistic Submodular Models , 2015, NIPS.

[31]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[32]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Rishabh K. Iyer,et al.  Fast Semidifferential-based Submodular Function Optimization , 2013, ICML.

[34]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[35]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[36]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[37]  Cristian Sminchisescu,et al.  Generalized Darting Monte Carlo , 2007, AISTATS.

[38]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .