Multicategory Classification by Support Vector Machines

We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machine (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise-linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise-nonlinear classification function. Each piece of this function can take the form of a polynomial, a radial basis function, or even a neural network. For the k > 2-class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach.

[1]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[2]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[3]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[4]  Olvi L. Mangasarian,et al.  Multisurface method of pattern separation , 1968, IEEE Trans. Inf. Theory.

[5]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[6]  Olvi L. Mangasarian,et al.  Nonlinear Programming , 1969 .

[7]  I. W. Evett,et al.  Rule induction in forensic science , 1989 .

[8]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[9]  Adrian Fm Smith,et al.  Probabilistic analysis of DNA profiles IAN W EVE'IT, DAVID J WERREIT Central Research and Support Establishment, Home Ofice Forensic Science Service, Aldermaston, Reading, Berkshire, United Kingdom RG7 4PN and , 1989 .

[10]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Somnath Mukhopadhyay,et al.  Pattern Classification Using Linear Programming , 1991, INFORMS J. Comput..

[12]  Kristin P. Bennett,et al.  Decision Tree Construction Via Linear Programming , 1992 .

[13]  Somnath Mukhopadhyay,et al.  A polynomial time algorithm for the construction and training of a class of multilayer perceptrons , 1993, Neural Networks.

[14]  Kristin P. Bennett,et al.  Serial and Parallel Multicategory Discrimination , 1994, SIAM J. Optim..

[15]  O. Mangasarian,et al.  Multicategory discrimination via linear programming , 1994 .

[16]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[17]  Asim Roy,et al.  An algorithm to generate radial basis function (RBF)-like nets for classification problems , 1995, Neural Networks.

[18]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[19]  Bernhard Schölkopf,et al.  Comparison of View-Based Object Recognition Algorithms Using Realistic 3D Models , 1996, ICANN.

[20]  Ralf Der,et al.  Building Nonlinear Data Models with Self-Organizing Maps , 1996, ICANN.

[21]  Olvi L. Mangasarian Mathematical Programming in Machine Learning , 1996 .

[22]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[23]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[24]  Kristin P. Bennett,et al.  Feature minimization within decision trees , 1998 .

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[27]  Olvi L. Mangasarian,et al.  Arbitrary-norm separating plane , 1999, Oper. Res. Lett..

[28]  Kristin P. Bennett,et al.  On support vector decision trees for database marketing , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.