Learning using privileged information: SV M+ and weighted SVM

Prior knowledge can be used to improve predictive performance of learning algorithms or reduce the amount of data required for training. The same goal is pursued within the learning using privileged information paradigm which was recently introduced by Vapnik et al. and is aimed at utilizing additional information available only at training time-a framework implemented by SVM+. We relate the privileged information to importance weighting and show that the prior knowledge expressible with privileged features can also be encoded by weights associated with every training example. We show that a weighted SVM can always replicate an SVM+ solution, while the converse is not true and we construct a counterexample highlighting the limitations of SVM+. Finally, we touch on the problem of choosing weights for weighted SVMs when privileged features are not available.

[1]  XuLei Yang,et al.  Weighted support vector machine for data classification , 2005 .

[2]  M. Elter,et al.  The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. , 2007, Medical physics.

[3]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[4]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[5]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[6]  Lior Wolf,et al.  The SVM-Minus Similarity Score for Video Face Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[8]  Dmitry Pechyony,et al.  Fast Optimization Algorithms for Solving SVM , 2012 .

[9]  Xiaoming Liu,et al.  Boosting with Side Information , 2012, ACCV.

[10]  Vladimir Vapnik,et al.  On the Theory of Learnining with Privileged Information , 2010, NIPS.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[13]  Yue Wang,et al.  Weighted support vector machine for data classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[14]  Peter Tiño,et al.  Learning Using Privileged Information in Prototype Based Models , 2012, ICANN.

[15]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[16]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[17]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[18]  Vladimir Vapnik,et al.  Learning using hidden information (Learning with teacher) , 2009, 2009 International Joint Conference on Neural Networks.

[19]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[20]  Masashi Sugiyama,et al.  Input-dependent estimation of generalization error under covariate shift , 2005 .

[21]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[22]  Dragos D. Margineantu,et al.  Class Probability Estimation and Cost-Sensitive Classification Decisions , 2002, ECML.

[23]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[24]  Gérard Bloch,et al.  Incorporating prior knowledge in support vector machines for classification: A review , 2008, Neurocomputing.

[25]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[26]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[27]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[30]  Vladimir Cherkassky,et al.  Connection between SVM+ and multi-task learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[31]  Vladimir Naumovich Vapnik,et al.  Estimation of dependences based on empirical data ; : Empirical inference science : afterword of 2006 , 2006 .

[32]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[33]  David J. Crisp,et al.  Uniqueness of the SVM Solution , 1999, NIPS.

[34]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[35]  Uwe Aickelin,et al.  Privileged information for data clustering , 2012, Inf. Sci..

[36]  Vladimir Cherkassky,et al.  Predictive learning with structured (grouped) data , 2009, Neural Networks.

[37]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[38]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[39]  J. Heckman Sample selection bias as a specification error , 1979 .