We propose a general approach for supervised learning with structured output spaces, such as combinatorial and polyhedral sets, that is based on minimizing estimated conditional risk functions. Given a loss function defined over pairs of output labels, we first estimate the conditional risk function by solving a (possibly infinite) collection of regularized least squares problems. A prediction is made by solving an auxiliary optimization problem that minimizes the estimated conditional risk function over the output space. We apply this method to a class of problems with discrete combinatorial outputs and additive pairwise losses, and show that the auxiliary problem can be solved efficiently by exact linear programming relaxations in several important cases, including variants of hierarchical multilabel classification and multilabel ranking problems. We demonstrate how the same approach can also be extended to vector regression problems with convex constraints and losses. Evaluations of this approach on hierarchical multilabel classification show that it compares favorably with several existing methods in terms of predictive accuracy, and has computational advantages over them when applied to large hierarchies.
[1]
Kenneth Steiglitz,et al.
Combinatorial Optimization: Algorithms and Complexity
,
1981
.
[2]
Robert Tibshirani,et al.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition
,
2001,
Springer Series in Statistics.
[3]
Tommi S. Jaakkola,et al.
Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees
,
2014,
AISTATS.
[4]
Bernhard Schölkopf,et al.
A Generalized Representer Theorem
,
2001,
COLT/EuroCOLT.
[5]
P. Rigollet.
18.S997: High Dimensional Statistics
,
2015
.
[6]
Mehryar Mohri,et al.
Algorithms for Learning Kernels Based on Centered Alignment
,
2012,
J. Mach. Learn. Res..
[7]
John N. Tsitsiklis,et al.
Introduction to linear optimization
,
1997,
Athena scientific optimization and computation series.