GenSVM: A Generalized Multiclass Support Vector Machine

Traditional extensions of the binary support vector machine (SVM) to multiclass problems are either heuristics or require solving a large dual optimization problem. Here, a generalized multiclass SVM is proposed called GenSVM. In this method classification boundaries for a K-class problem are constructed in a (K - 1)-dimensional space using a simplex encoding. Additionally, several different weightings of the misclassification errors are incorporated in the loss function, such that it generalizes three existing multiclass SVMs through a single optimization problem. An iterative majorization algorithm is derived that solves the optimization problem without the need of a dual formulation. This algorithm has the advantage that it can use warm starts during cross validation and during a grid search, which significantly speeds up the training phase. Rigorous numerical experiments compare linear GenSVM with seven existing multiclass SVMs on both small and large data sets. These comparisons show that the proposed method is competitive with existing methods in both predictive accuracy and training time, and that it significantly outperforms several existing methods on these criteria.

[1]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[2]  Ulrich Eckhardt,et al.  Linear convergence of generalized Weiszfeld's method , 1980, Computing.

[3]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[4]  Arnaud Doucet,et al.  A Framework for Kernel-Based Multi-Category Classification , 2007, J. Artif. Intell. Res..

[5]  R. Gosine,et al.  Fast Training of Multi-Class Support Vector Machines , 2004 .

[6]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[7]  Csaba Szepesvári,et al.  Cost-sensitive Multiclass Classification Risk Bounds , 2013, ICML.

[8]  Yann Guermeur,et al.  MSVMpack: A Multi-Class Support Vector Machine Package , 2011, J. Mach. Learn. Res..

[9]  Constantin F. Aliferis,et al.  A Gentle Introduction to Support Vector Machines in Biomedicine: Case Studies , 2011 .

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[12]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[13]  J. Leeuw Applications of Convex Analysis to Multidimensional Scaling , 2000 .

[14]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[17]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[18]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[19]  Emmanuel Monfrini,et al.  A Quadratic Loss Multi-Class SVM for which a Radius-Margin Bound Applies , 2011, Informatica.

[20]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[21]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[22]  Yoshitaka Morikawa,et al.  An l p Norm Minimization Using Auxiliary Function for Compressed Sensing , 2012 .

[23]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[24]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[25]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[26]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[27]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[28]  Patrick J. F. Groenen,et al.  SVM-Maj: a majorization approach to linear support vector machines with different hinge errors , 2007, Adv. Data Anal. Classif..

[29]  Patrick J. F. Groenen,et al.  Nonlinear Support Vector Machines Through Iterative Majorization and I-Splines , 2006, GfKl.

[30]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[31]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[32]  L. Hubert,et al.  Comparing partitions , 1985 .

[33]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[34]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[35]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[36]  Lorenzo Rosasco,et al.  Multiclass Learning with Simplex Coding , 2012, NIPS.

[37]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[38]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[39]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[40]  J. Leeuw,et al.  Fitting longitudinal reduced-rank regression models by alternating least squares , 1991 .

[41]  P. Groenen,et al.  The tunneling method for global optimization in multidimensional scaling , 1996 .

[42]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[43]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[44]  L. Bottou,et al.  1 Support Vector Machine Solvers , 2007 .

[45]  P. Groenen,et al.  Global Optimization in Least-Squares Multidimensional Scaling by Distance Smoothing , 1999 .

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[47]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[48]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[49]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[50]  J. Leeuw Convergence of the majorization method for multidimensional scaling , 1988 .

[51]  Jorge J. Moré,et al.  Benchmarking optimization software with performance profiles , 2001, Math. Program..

[52]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[53]  Timothy F. Havel An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. , 1991, Progress in biophysics and molecular biology.

[54]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[55]  M. Shirosaki Another proof of the defect relation for moving targets , 1991 .

[56]  Alexander Romanovich Statnikov A gentle introduction to support vector medicines in biomedicine , 2011 .

[57]  Chih-Jen Lin,et al.  A sequential dual method for large scale multi-class linear svms , 2008, KDD.

[58]  Mario A. Storti,et al.  MPI for Python , 2005, J. Parallel Distributed Comput..

[59]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[60]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[61]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[62]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[63]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[64]  J. Leeuw Fitting Distances by Least Squares , 1993 .

[65]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[66]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[67]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .