Efficient optimization for probably submodular constraints in CRFs

Constructive machine learning can frequently be formulated in the setting of structured prediction. Structured prediction is typically modeled using a compatibility function between inputs and outputs and a decoding step, in which an optimization over the compatibility function with respect to the output is performed. When the compatibility function is equivalent to an (unnormalized) probability in a graphical model such as a random field, decoding can be seen as equivalent to MAP inference in that model. Random field models with loops require constraints, such as submodularity of the pairwise potentials, to ensure feasible test time inference. Recently, a framework has been proposed in which a discriminative learning phase only probabilistically guarantees submodularity, enlarging the model space and explicitly trading off between model error and inference error (Zaremba and Blaschko, 2016). A difficulty with this framework is that the optimization requires the enforcement of a set of constraints whose size is proportional to the number of edges in all instances in the training set. In e.g. vision applications such as segmentation, thousands of megapixel images in the training set can lead to billions of constraints during optimization. In this work, we show that a delayed constraint generation framework built on a simple application of the Cauchy-Schwartz inequality and inexact pretraining leads to substantial reduction in computational requirements for exact application of the framework. An experimental evaluation shows the computational efficiency of the proposed optimization framework.