From MAP to Marginals: Variational Inference in Bayesian Submodular Models

Submodular optimization has found many applications in machine learning and beyond. We carry out the first systematic investigation of inference in probabilistic models defined through submodular functions, generalizing regular pairwise MRFs and Determinantal Point Processes. In particular, we present L-FIELD, a variational approach to general log-submodular and log-supermodular distributions based on sub- and supergradients. We obtain both lower and upper bounds on the log-partition function, which enables us to compute probability intervals for marginals, conditionals and marginal likelihoods. We also obtain fully factorized approximate posteriors, at the same computational cost as ordinary submodular optimization. Our framework results in convex problems for optimizing over differentials of submodular functions, which we show how to optimally solve. We provide theoretical guarantees of the approximation quality with respect to the curvature of the function. We further establish natural relations between our variational approach and the classical mean-field method. Lastly, we empirically demonstrate the accuracy of our inference scheme on several submodular models.

[1]  Gérard Cornuéjols,et al.  Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem , 1984, Discret. Appl. Math..

[2]  Rishabh K. Iyer,et al.  Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions , 2013, NIPS.

[3]  Jan Vondrák,et al.  Maximizing a Submodular Set Function Subject to a Matroid Constraint (Extended Abstract) , 2007, IPCO.

[4]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[5]  Zhanxing Zhu,et al.  Neural Information Processing Systems (NIPS) , 2015 .

[6]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[7]  Ben Taskar,et al.  Near-Optimal MAP Inference for Determinantal Point Processes , 2012, NIPS.

[8]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[9]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[10]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[11]  Andreas Krause,et al.  Efficient Minimization of Decomposable Submodular Functions , 2010, NIPS.

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  William H. Cunningham,et al.  Decomposition of submodular functions , 1983, Comb..

[14]  Jeff A. Bilmes,et al.  Submodularity beyond submodular energies: Coupling edges in graph cuts , 2011, CVPR 2011.

[15]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[16]  Suvrit Sra,et al.  Reflection methods for user-friendly submodular optimization , 2013, NIPS.

[17]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Joseph Naor,et al.  A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization , 2015, SIAM J. Comput..

[19]  Hui Lin,et al.  On fast approximate submodular minimization , 2011, NIPS.

[20]  Andreas Krause,et al.  Greedy Dictionary Selection for Sparse Representation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[21]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[22]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[23]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[25]  R. Rockafellar Extension of Fenchel’ duality theorem for convex functions , 1966 .

[26]  VekslerOlga,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001 .

[27]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Rishabh K. Iyer,et al.  Fast Semidifferential-based Submodular Function Optimization , 2013, ICML.

[29]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[30]  Leslie Ann Goldberg,et al.  The Complexity of Ferromagnetic Ising with Local Fields , 2006, Combinatorics, Probability and Computing.

[31]  Jeff A. Bilmes,et al.  Q-Clustering , 2005, NIPS.

[32]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[33]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[35]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[36]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[37]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[38]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[39]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.

[40]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[41]  K. Taira Proof of Theorem 1.3 , 2004 .