Constructive machine learning can frequently be formulated in the setting of structured prediction. Structured prediction is typically modeled using a compatibility function between inputs and outputs and a decoding step, in which an optimization over the compatibility function with respect to the output is performed. When the compatibility function is equivalent to an (unnormalized) probability in a graphical model such as a random field, decoding can be seen as equivalent to MAP inference in that model. Random field models with loops require constraints, such as submodularity of the pairwise potentials, to ensure feasible test time inference. Recently, a framework has been proposed in which a discriminative learning phase only probabilistically guarantees submodularity, enlarging the model space and explicitly trading off between model error and inference error (Zaremba and Blaschko, 2016). A difficulty with this framework is that the optimization requires the enforcement of a set of constraints whose size is proportional to the number of edges in all instances in the training set. In e.g. vision applications such as segmentation, thousands of megapixel images in the training set can lead to billions of constraints during optimization. In this work, we show that a delayed constraint generation framework built on a simple application of the Cauchy-Schwartz inequality and inexact pretraining leads to substantial reduction in computational requirements for exact application of the framework. An experimental evaluation shows the computational efficiency of the proposed optimization framework.
[1]
F. Barahona.
On the computational complexity of Ising spin glass models
,
1982
.
[2]
Olga Veksler,et al.
Fast Approximate Energy Minimization via Graph Cuts
,
2001,
IEEE Trans. Pattern Anal. Mach. Intell..
[3]
Derek Hoiem,et al.
Learning CRFs Using Graph Cuts
,
2008,
ECCV.
[4]
Thomas Hofmann,et al.
Large Margin Methods for Structured and Interdependent Output Variables
,
2005,
J. Mach. Learn. Res..
[5]
J. Magnus,et al.
Matrix Differential Calculus with Applications in Statistics and Econometrics
,
1991
.
[6]
David S. Johnson,et al.
Computers and Intractability: A Guide to the Theory of NP-Completeness
,
1978
.
[7]
Nir Friedman,et al.
Probabilistic Graphical Models - Principles and Techniques
,
2009
.
[8]
Wojciech Zaremba,et al.
Discriminative training of CRF models with probably submodular constraints
,
2016,
2016 IEEE Winter Conference on Applications of Computer Vision (WACV).
[9]
Sven Behnke,et al.
PyStruct: learning structured prediction in python
,
2014,
J. Mach. Learn. Res..
[10]
D. Greig,et al.
Exact Maximum A Posteriori Estimation for Binary Images
,
1989
.
[11]
Thorsten Joachims,et al.
Cutting-plane training of structural SVMs
,
2009,
Machine Learning.