Semi-Supervised Support Vector Machines

We introduce a semi-supervised support vector machine (S3VM) method. Given a training set of labeled data and a working set of unlabeled data, S3VM constructs a support vector machine using both the training and working sets. We use S3VM to solve the transduction problem using overall risk minimization (ORM) posed by Vapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data. We propose a general S3VM model that minimizes both the misclassification error and the function capacity based on all the available data. We show how the S3VM model for 1-norm linear support vector machines can be converted to a mixed-integer program and then solved exactly using integer programming. Results of S3VM and the standard 1-norm support vector machine approach are compared on ten data sets. Our computational results support the statistical learning theory results showing that incorporating working data improves generalization when insufficient training information is available. In every case, S3VM either improved or showed no significant difference in generalization compared to the traditional approach.

[1]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[2]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[3]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[4]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[5]  Brian W. Kernighan,et al.  AMPL: A Modeling Language for Mathematical Programming , 1993 .

[6]  James C. Bezdek,et al.  Partially supervised clustering for image segmentation , 1996, Pattern Recognit..

[7]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[8]  Thilo-Thomas Friel,et al.  Linear Programming Support Vector Machines for Pattern Classification and Regression Estimation: and The SR Algorithm: Improving Speed and Tightness of VC Bounds in SV Algorithms , 1998 .

[9]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[10]  Paul S. Bradley,et al.  Parsimonious Least Norm Approximation , 1998, Comput. Optim. Appl..

[11]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[12]  Kristin P. Bennett,et al.  Feature minimization within decision trees , 1998 .

[13]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Kristin P. Bennett,et al.  On support vector decision trees for database marketing , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[16]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.