The Benefits of Modeling Slack Variables in SVMs

In this letter, we explore the idea of modeling slack variables in support vector machine (SVM) approaches. The study is motivated by SVM+, which models the slacks through a smooth correcting function that is determined by additional (privileged) information about the training examples not available in the test phase. We take a closer look at the meaning and consequences of smooth modeling of slacks, as opposed to determining them in an unconstrained manner through the SVM optimization program. To better understand this difference we only allow the determination and modeling of slack values on the same information—that is, using the same training input in the original input space. We also explore whether it is possible to improve classification performance by combining (in a convex combination) the original SVM slacks with the modeled ones. We show experimentally that this approach not only leads to improved generalization performance but also yields more compact, lower-complexity models. Finally, we extend this idea to the context of ordinal regression, where a natural order among the classes exists. The experimental results confirm principal findings from the binary case.

[1]  Ling Li,et al.  Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classification , 2012, Neural Computation.

[2]  Ivor W. Tsang,et al.  Transductive Ordinal Regression , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Vladimir Cherkassky,et al.  Learning Using Structured Data: Application to fMRI Data Analysis , 2007, 2007 International Joint Conference on Neural Networks.

[4]  Peter Tiño,et al.  Adaptive Metric Learning Vector Quantization for Ordinal Classification , 2012, Neural Computation.

[5]  W. Pyle A Theory of Learning. , 1924 .

[6]  Pedro Antonio Gutiérrez,et al.  Exploitation of Pairwise Class Distances for Ordinal Classification , 2013, Neural Computation.

[7]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[8]  Bernardete Ribeiro,et al.  Financial distress model prediction using SVM+ , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[9]  Jaime S. Cardoso,et al.  Learning to Classify Ordinal Data: The Data Replication Method , 2007, J. Mach. Learn. Res..

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  V. Vapnik,et al.  On the theory of learning with Privileged Information , 2010, NIPS 2010.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[16]  Vladimir Vapnik,et al.  Learning using hidden information (Learning with teacher) , 2009, 2009 International Joint Conference on Neural Networks.

[17]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[18]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[19]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .