Differentiable Learning of Submodular Models

Can we incorporate discrete optimization algorithms within modern machine learning models? For example, is it possible to use in deep architectures a layer whose output is the minimal cut of a parametrized graph? Given that these models are trained end-to-end by leveraging gradient information, the introduction of such layers seems very challenging due to their non-continuous output. In this paper we focus on the problem of submodular minimization, for which we show that such layers are indeed possible. The key idea is that we can continuously relax the output without sacrificing guarantees. We provide an easily computable approximation to the Jacobian complemented with a complete theoretical analysis. Finally, these contributions let us experimentally learn probabilistic log-supermodular models via a bi-level variational inference formulation.

[1]  J. Boot,et al.  On Sensitivity Analysis in Convex Quadratic Programming Problems , 1963 .

[2]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[3]  A. Shapiro Sensitivity analysis of nonlinear programs and differentiability properties of metric projections , 1988 .

[4]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[5]  Michael J. Best,et al.  Active set algorithms for isotonic regression; A unifying framework , 1990, Math. Program..

[6]  H. Groenevelt Two algorithms for maximizing a separable concave function over a polymatroid feasible region , 1991 .

[7]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[8]  Nilotpal Chakravarti Sensitivity Analysis in Isotonic Regression , 1993, Discret. Appl. Math..

[9]  Maurice Queyranne,et al.  Minimizing symmetric submodular functions , 1998, Math. Program..

[10]  R. Tyrrell Rockafellar,et al.  Variational Analysis , 1998, Grundlehren der mathematischen Wissenschaften.

[11]  Alexander Schrijver,et al.  A Combinatorial Algorithm Minimizing Submodular Functions in Strongly Polynomial Time , 2000, J. Comb. Theory B.

[12]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.

[13]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[14]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[16]  S. Fujishige,et al.  The Minimum-Norm-Point Algorithm Applied to Submodular Function Minimization and Linear Programming , 2006 .

[17]  Martin J. Wainwright,et al.  Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting , 2006, J. Mach. Learn. Res..

[18]  Marshall F. Tappen,et al.  Utilizing Variational Optimization to Learn Markov Random Fields , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Shimon Ullman,et al.  Combined Top-Down/Bottom-Up Segmentation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[22]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[23]  Francis Bach,et al.  Shaping Level Sets with Submodular Functions , 2010, NIPS.

[24]  Shuji Kijima,et al.  Online Prediction under Submodular Constraints , 2012, ALT.

[25]  Karl Kunisch,et al.  A Bilevel Optimization Approach for Parameter Learning in Variational Models , 2013, SIAM J. Imaging Sci..

[26]  Yaoliang Yu,et al.  On Decomposing the Proximal Map , 2013, NIPS.

[27]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[28]  Justin Domke,et al.  Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andreas Krause,et al.  From MAP to Marginals: Variational Inference in Bayesian Submodular Models , 2014, NIPS.

[30]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Thomas Brox,et al.  Bilevel Optimization with Nonsmooth Lower Level Problems , 2015, SSVM.

[33]  Andreas Krause,et al.  Scalable Variational Inference in Log-supermodular Models , 2015, ICML.

[34]  Jeff A. Bilmes,et al.  Deep Submodular Functions: Definitions and Learning , 2016, NIPS.

[35]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[36]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[37]  K. S. Sesh Kumar,et al.  Active-set Methods for Submodular Minimization Problems , 2015, J. Mach. Learn. Res..

[38]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[39]  Suvrit Sra,et al.  Modular Proximal Optimization for Multidimensional Total-Variation Regularization , 2014, J. Mach. Learn. Res..