Multicategory Proximal Support Vector Machine Classifiers

Given a dataset, each element of which labeled by one of k labels, we construct by a very fast algorithm, a k-category proximal support vector machine (PSVM) classifier. Proximal support vector machines and related approaches (Fung & Mangasarian, 2001; Suykens & Vandewalle, 1999) can be interpreted as ridge regression applied to classification problems (Evgeniou, Pontil, & Poggio, 2000). Extensive computational results have shown the effectiveness of PSVM for two-class classification problems where the separating plane is constructed in time that can be as little as two orders of magnitude shorter than that of conventional support vector machines. When PSVM is applied to problems with more than two classes, the well known one-from-the-rest approach is a natural choice in order to take advantage of its fast performance. However, there is a drawback associated with this one-from-the-rest approach. The resulting two-class problems are often very unbalanced, leading in some cases to poor performance. We propose balancing the k classes and a novel Newton refinement modification to PSVM in order to deal with this problem. Computational results indicate that these two modifications preserve the speed of PSVM while often leading to significant test set improvement over a plain PSVM one-from-the-rest application. The modified approach is considerably faster than other one-from-the-rest methods that use conventional SVM formulations, while still giving comparable test set correctness.

[1]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[4]  Johan A. K. Suykens,et al.  Multiclass least squares support vector machines , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[5]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[6]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[7]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[8]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[9]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[10]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[11]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[12]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[13]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[14]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[15]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[16]  J. Hiriart-Urruty,et al.  Generalized Hessian matrix and second-order optimality conditions for problems withC1,1 data , 1984 .

[17]  Olvi L. Mangasarian,et al.  Nonlinear Programming , 1969 .

[18]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[19]  Susan Eitelman,et al.  Matlab Version 6.5 Release 13. The MathWorks, Inc., 3 Apple Hill Dr., Natick, MA 01760-2098; 508/647-7000, Fax 508/647-7001, www.mathworks.com , 2003 .

[20]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[21]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[24]  O. Mangasarian,et al.  Multicategory discrimination via linear programming , 1994 .

[25]  C. Kanzow,et al.  On the Minimum Norm Solution of Linear Programs , 2003 .

[26]  Francisco Facchinei,et al.  Minimization of SC1 functions and the Maratos effect , 1995, Oper. Res. Lett..

[27]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[28]  Olvi L. Mangasarian,et al.  Hybrid misclassification minimization , 1996, Adv. Comput. Math..

[29]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[30]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[31]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[32]  Johan A. K. Suykens,et al.  Multiclass LS-SVMs: Moderated Outputs and Coding-Decoding Schemes , 2002, Neural Processing Letters.

[33]  Harris Drucker,et al.  A Case Study in Handwritten Digit Recognition , 1994 .

[34]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[35]  Johan A. K. Suykens,et al.  Least squares support vector machine classifiers: a large scale algorithm , 1999 .