Sampling from Probabilistic Submodular Models

Submodular and supermodular functions have found wide applicability in machine learning, capturing notions such as diversity and regularity, respectively. These notions have deep consequences for optimization, and the problem of (approximately) optimizing submodular functions has received much attention. However, beyond optimization, these notions allow specifying expressive probabilistic models that can be used to quantify predictive uncertainty via marginal inference. Prominent, well-studied special cases include Ising models and determinantal point processes, but the general class of log-submodular and log-supermodular models is much richer and little studied. In this paper, we investigate the use of Markov chain Monte Carlo sampling to perform approximate inference in general log-submodular and log-supermodular models. In particular, we consider a simple Gibbs sampling procedure, and establish two sufficient conditions, the first guaranteeing polynomial-time, and the second fast (O(n log n)) mixing. We also evaluate the efficiency of the Gibbs sampler on three examples of such models, and compare against a recently proposed variational approach.

[1]  Martin E. Dyer,et al.  Path coupling: A technique for proving rapid mixing in Markov chains , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[2]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[3]  Eric Vigoda,et al.  A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries , 2001, STOC '01.

[4]  V. Climenhaga Markov chains and mixing times , 2013 .

[5]  藤重 悟 Submodular functions and optimization , 1991 .

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[8]  Ove Granstrand,et al.  Innovation and Intellectual Property Rights , 2006 .

[9]  Martin E. Dyer,et al.  On Markov Chains for Independent Sets , 2000, J. Algorithms.

[10]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[11]  Sunil Kanwar,et al.  Innovation and Intellectual Property Rights , 2006 .

[12]  Gérard Cornuéjols,et al.  Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem , 1984, Discret. Appl. Math..

[13]  Andreas Krause,et al.  From MAP to Marginals: Variational Inference in Bayesian Submodular Models , 2014, NIPS.

[14]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[15]  Martin E. Dyer,et al.  Matrix norms and rapid mixing for spin systems , 2007, ArXiv.

[16]  C. Guestrin,et al.  Near-optimal sensor placements: maximizing information while minimizing communication cost , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[17]  Bimal Kumar Roy,et al.  Counting, sampling and integrating: Algorithms and complexity , 2013 .

[18]  Alistair Sinclair,et al.  Improved Bounds for Mixing Rates of Marked Chains and Multicommodity Flow , 1992, LATIN.

[19]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[20]  Vahab Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2007, FOCS 2007.

[21]  Rishabh K. Iyer,et al.  Submodular Point Processes with Applications to Machine learning , 2015, AISTATS.

[22]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[23]  D. Aldous Random walks on finite groups and rapidly mixing markov chains , 1983 .

[24]  P. Diaconis,et al.  Geometric Bounds for Eigenvalues of Markov Chains , 1991 .

[25]  Vahab S. Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[26]  Alistair Sinclair,et al.  Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow , 1992, Combinatorics, Probability and Computing.

[27]  Amin Karbasi,et al.  Fast Mixing for Discrete Point Processes , 2015, COLT.

[28]  Martin E. Dyer,et al.  Beating the 2Δ bound for approximately counting colourings: a computer-assisted proof of rapid mixing , 1998, SODA '98.

[29]  Mark Jerrum,et al.  A Very Simple Algorithm for Estimating the Number of k-Colorings of a Low-Degree Graph , 1995, Random Struct. Algorithms.

[30]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[31]  Eric Vigoda,et al.  A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries , 2004, JACM.

[32]  Mark Jerrum,et al.  Approximating the Permanent , 1989, SIAM J. Comput..