Scalable Variational Inference in Log-supermodular Models

We consider the problem of approximate Bayesian inference in log-supermodular models. These models encompass regular pairwise MRFs with binary variables, but allow to capture highorder interactions, which are intractable for existing approximate inference techniques such as belief propagation, mean field, and variants. We show that a recently proposed variational approach to inference in log-supermodular models -L-FIELD- reduces to the widely-studied minimum norm problem for submodular minimization. This insight allows to leverage powerful existing tools, and hence to solve the variational problem orders of magnitude more efficiently than previously possible. We then provide another natural interpretation of L-FIELD, demonstrating that it exactly minimizes a specific type of Renyi divergence measure. This insight sheds light on the nature of the variational approximations produced by L-FIELD. Furthermore, we show how to perform parallel inference as message passing in a suitable factor graph at a linear convergence rate, without having to sum up over all the configurations of the factor. Finally, we apply our approach to a challenging image segmentation task. Our experiments confirm scalability of our approach, high quality of the marginals, and the benefit of incorporating higher-order potentials.

[1]  Suvrit Sra,et al.  Fast Newton-type Methods for Total Variation Regularization , 2011, ICML.

[2]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[3]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.

[4]  Jeff A. Bilmes,et al.  Submodularity beyond submodular energies: Coupling edges in graph cuts , 2011, CVPR 2011.

[5]  Satoru Fujishige,et al.  Lexicographically Optimal Base of a Polymatroid with Respect to a Weight Vector , 1980, Math. Oper. Res..

[6]  A. Rényi On Measures of Entropy and Information , 1961 .

[7]  Suvrit Sra,et al.  Reflection methods for user-friendly submodular optimization , 2013, NIPS.

[8]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[11]  Kazuyuki Aihara,et al.  Equivalence of convex minimization problems over base polytopes , 2012 .

[12]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[15]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[16]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[17]  James B. Orlin,et al.  A faster strongly polynomial time algorithm for submodular function minimization , 2007, Math. Program..

[18]  Andreas Krause,et al.  Greedy Dictionary Selection for Sparse Representation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[19]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[20]  Rishabh K. Iyer,et al.  Submodular Point Processes with Applications to Machine learning , 2015, AISTATS.

[21]  Pravesh Kothari,et al.  Provable Submodular Minimization using Wolfe's Algorithm , 2014, NIPS.

[22]  Leslie Ann Goldberg,et al.  The Complexity of Ferromagnetic Ising with Local Fields , 2006, Combinatorics, Probability and Computing.

[23]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[24]  Andreas Krause,et al.  Efficient Minimization of Decomposable Submodular Functions , 2010, NIPS.

[25]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26]  Andreas Krause,et al.  From MAP to Marginals: Variational Inference in Bayesian Submodular Models , 2014, NIPS.

[27]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[28]  Michael I. Jordan,et al.  On the Convergence Rate of Decomposable Submodular Function Minimization , 2014, NIPS.

[29]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[30]  Rishabh Iyer Submodular Point Processes , 2014 .