Discriminative training of CRF models with probably submodular constraints

Problems of segmentation, denoising, registration and 3d reconstruction are often addressed with the graph cut algorithm. However, solving an unconstrained graph cut problem is NP-hard. For tractable optimization, pairwise potentials have to fulfill the submodularity inequality. In our learning paradigm, pairwise potentials are created as the dot product of a learned vector w with positive feature vectors. In order to constrain such a model to remain tractable, previous approaches have enforced the weight vector to be positive for pairwise potentials in which the labels differ, and set pairwise potentials to zero in the case that the label remains the same. Such constraints are sufficient to guarantee that the resulting pairwise potentials satisfy the submodularity inequality. However, we show that such an approach unnecessarily restricts the capacity of the learned models. Instead, we approach the problem of learning with sub-modularity constraints from a probabilistic setting. Prediction errors may be the result of: learning error, model error, or inference error. Guaranteeing submodularity for all possible inputs, no matter how improbable, reduces inference error to effectively zero, but increases model error. In contrast, we relax the requirement of guaranteed submodularity to solutions that are submodular with high probability. We show that the conceptually simple strategy of enforcing submodularity on the training examples guarantees with low sample complexity that test images will also yield sub-modular pairwise potentials. Results are presented showing substantial improvement from the resulting increased model capacity.

[1]  Sebastian Nowozin,et al.  On Parameter Learning in CRF-Based Approaches to Object Class Image Segmentation , 2010, ECCV.

[2]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[3]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[4]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[8]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[11]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[12]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[13]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Christian Buchta,et al.  Distribution-independent properties of the convex hull of random points , 1990 .

[15]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[17]  Philip M. Long,et al.  Baum's Algorithm Learns Intersections of Halfspaces with Respect to Log-Concave Distributions , 2009, APPROX-RANDOM.

[18]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[19]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.