Provable Variational Inference for Constrained Log-Submodular Models

Submodular maximization problems appear in several areas of machine learning and data science, as many useful modelling concepts such as diversity and coverage satisfy this natural diminishing returns property. Because the data defining these functions, as well as the decisions made with the computed solutions, are subject to statistical noise and randomness, it is arguably necessary to go beyond computing a single approximate optimum and quantify its inherent uncertainty. To this end, we define a rich class of probabilistic models associated with constrained submodular maximization problems. These capture log-submodular dependencies of arbitrary order between the variables, but also satisfy hard combinatorial constraints. Namely, the variables are assumed to take on one of — possibly exponentially many — set of states, which form the bases of a matroid. To perform inference in these models we design novel variational inference algorithms, which carefully leverage the combinatorial and probabilistic properties of these objects. In addition to providing completely tractable and well-understood variational approximations, our approach results in the minimization of a convex upper bound on the log-partition function. The bound can be efficiently evaluated using greedy algorithms and optimized using any first-order method. Moreover, for the case of facility location and weighted coverage functions, we prove the first constant factor guarantee in this setting — an efficiently certifiable e/(e-1) approximation of the log-partition function. Finally, we empirically demonstrate the effectiveness of our approach on several instances.

[1]  Jack Edmonds,et al.  Matroids and the greedy algorithm , 1971, Math. Program..

[2]  Ryan P. Adams,et al.  Cardinality Restricted Boltzmann Machines , 2012, NIPS.

[3]  Martin J. Wainwright,et al.  Log-determinant relaxation for approximate inference in discrete Markov random fields , 2006, IEEE Transactions on Signal Processing.

[4]  G. Crooks On Measures of Entropy and Information , 2015 .

[5]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[6]  Jan Vondrák,et al.  Maximizing a Monotone Submodular Function Subject to a Matroid Constraint , 2011, SIAM J. Comput..

[7]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[8]  Michael I. Jordan Graphical Models , 1998 .

[9]  Akiyoshi Shioura,et al.  On the Pipage Rounding Algorithm for Submodular Function Maximization - a View from Discrete Convex Analysis , 2009, Discret. Math. Algorithms Appl..

[10]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[11]  Jan Vondrák,et al.  Submodularity in Combinatorial Optimization , 2007 .

[12]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[13]  Andreas Krause,et al.  Learning Probabilistic Submodular Diversity Models Via Noise Contrastive Estimation , 2016, AISTATS.

[14]  Andreas Krause,et al.  From MAP to Marginals: Variational Inference in Bayesian Submodular Models , 2014, NIPS.

[15]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[16]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[17]  Brendan J. Frey,et al.  Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.

[18]  James G. Oxley,et al.  Matroid theory , 1992 .

[19]  R. Lyons Determinantal probability measures , 2002, math/0204325.

[20]  Zaifu Yang,et al.  A Note on Kelso and Crawford's Gross Substitutes Condition , 2003, Math. Oper. Res..

[21]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[22]  Andreas Krause,et al.  Higher-Order Inference for Multi-class Log-Supermodular Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Stephen B. Maurer Matrix Generalizations of Some Theorems on Trees, Cycles and Cocycles in Graphs , 1976 .

[24]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[25]  David A. Smith,et al.  Dependency Parsing by Belief Propagation , 2008, EMNLP.

[26]  藤重 悟 Submodular functions and optimization , 1991 .

[27]  Jan Vondrák,et al.  Dependent Randomized Rounding via Exchange Properties of Combinatorial Structures , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[28]  Amin Karbasi,et al.  Fast Mixing for Discrete Point Processes , 2015, COLT.

[29]  Andreas W. M. Dress,et al.  Well-layered maps—A class of greedily optimizable set functions , 1995 .

[30]  Andrej Risteski,et al.  How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods , 2016, COLT.

[31]  Suvrit Sra,et al.  Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling , 2016, NIPS.

[32]  T. Liggett,et al.  Negative dependence and the geometry of polynomials , 2007, 0707.2340.

[33]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[34]  Amin Saberi,et al.  Correlation robust stochastic optimization , 2009, SODA '10.

[35]  Nisheeth K. Vishnoi,et al.  Fair and Diverse DPP-based Data Summarization , 2018, ICML.

[36]  Xavier Carreras,et al.  Structured Prediction Models via the Matrix-Tree Theorem , 2007, EMNLP.

[37]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[38]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[39]  Andreas Krause,et al.  Variational Inference in Mixed Probabilistic Submodular Models , 2016, NIPS.

[40]  Michael I. Jordan,et al.  Variational Inference over Combinatorial Spaces , 2010, NIPS.

[41]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[42]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[43]  Renato Paes Leme,et al.  Gross substitutability: An algorithmic survey , 2017, Games Econ. Behav..

[44]  Kazuo Murota,et al.  Discrete convex analysis , 1998, Math. Program..

[45]  Andreas Krause,et al.  Scalable Variational Inference in Log-supermodular Models , 2015, ICML.

[46]  K. Murota Discrete convex analysis: A tool for economics and game theory , 2016, 2212.03598.

[47]  Andreas Krause,et al.  Stochastic Submodular Maximization: The Case of Coverage Functions , 2017, NIPS.

[48]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[49]  Alkis Gotovos,et al.  Sampling from Probabilistic Submodular Models , 2015, NIPS.

[50]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.