Convexification of Learning from Constraints

Regularized empirical risk minimization with constrained labels (in contrast to fixed labels) is a remarkably general abstraction of learning. For common loss and regularization functions, this optimization problem assumes the form of a mixed integer program (MIP) whose objective function is non-convex. In this form, the problem is resistant to standard optimization techniques. We construct MIPs with the same solutions whose objective functions are convex. Specifically, we characterize the tightest convex extension of the objective function, given by the Legendre-Fenchel biconjugate. Computing values of this tightest convex extension is NP-hard. However, by applying our characterization to every function in an additive decomposition of the objective function, we obtain a class of looser convex extensions that can be computed efficiently. For some decompositions, common loss and regularization functions, we derive a closed form.

[1]  Christodoulos A. Floudas,et al.  Mixed Integer Nonlinear Programming , 2009, Encyclopedia of Optimization.

[2]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[3]  Cordelia Schmid,et al.  Finding Actors and Actions in Movies , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Robert Weismantel,et al.  The Convex Envelope of (n--1)-Convex Functions , 2008, SIAM J. Optim..

[5]  Robert Weismantel,et al.  Convex Relaxations for Mixed-Integer Nonlinear Programs , 2013 .

[6]  Marco Locatelli A technique to derive the analytical form of convex envelopes for some bivariate functions , 2014, J. Glob. Optim..

[7]  Nikolaos V. Sahinidis,et al.  Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming , 2002 .

[8]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[9]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[10]  M. R. Rao,et al.  The partition problem , 1993, Math. Program..

[11]  Christian Kirches,et al.  Mixed-integer nonlinear optimization*† , 2013, Acta Numerica.

[12]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[13]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[14]  Dale Schuurmans,et al.  Convex Relaxations of Latent Variable Training , 2007, NIPS.

[15]  Tijl De Bie,et al.  Semi-Supervised Learning Using Semi-Definite Programming , 2006, Semi-Supervised Learning.

[16]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[17]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[18]  Dale Schuurmans,et al.  Adaptive Large Margin Training for Multilabel Classification , 2011, AAAI.

[19]  Daniel Cremers,et al.  A Convex Formulation of Continuous Multi-label Problems , 2008, ECCV.

[20]  Jean-Philippe P. Richard,et al.  KRANNERT GRADUATE SCHOOL OF MANAGEMENT , 2010 .

[21]  Martin Ballerstein Convex relaxations for mixed-integer nonlinear programs , 2013 .

[22]  Alexander Zien,et al.  A continuation method for semi-supervised SVMs , 2006, ICML.

[23]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[24]  Jeff T. Linderoth,et al.  Algorithms and Software for Convex Mixed Integer Nonlinear Programs , 2012 .

[25]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2007, IEEE Transactions on Neural Networks.

[26]  Nikolaos V. Sahinidis,et al.  Global optimization of mixed-integer nonlinear programs: A theoretical and computational study , 2004, Math. Program..

[27]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[28]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[29]  Francis R. Bach,et al.  A convex relaxation for weakly supervised classifiers , 2012, ICML.

[30]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[31]  Nikolaos V. Sahinidis,et al.  Convex envelopes generated from finitely many compact convex sets , 2013, Math. Program..

[32]  S. Sathiya Keerthi,et al.  Deterministic annealing for semi-supervised kernel machines , 2006, ICML.

[33]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[34]  Daniel Cremers,et al.  A convex relaxation approach for computing minimal partitions , 2009, CVPR.

[35]  Daniel Cremers,et al.  An algorithm for minimizing the Mumford-Shah functional , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Nikolaos V. Sahinidis,et al.  Convex envelopes of products of convex and component-wise concave functions , 2012, J. Glob. Optim..

[37]  Gerhard Reinelt,et al.  The Linear Ordering Problem , 2011 .

[38]  Daniel Cremers,et al.  A convex representation for the vectorial Mumford-Shah functional , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[40]  S. Sathiya Keerthi,et al.  Branch and Bound for Semi-Supervised Support Vector Machines , 2006, NIPS.

[41]  Daniel Cremers,et al.  A Convex Approach to Minimal Partitions , 2012, SIAM J. Imaging Sci..

[42]  Gerhard Reinelt,et al.  The Linear Ordering Problem: Exact and Heuristic Methods in Combinatorial Optimization , 2011 .