An Evaluation of Discrete Support Vector Machines for Cost-Sensitive Learning

The problem of cost-sensitive learning involves classification analysis in scenarios where different error types are associated with asymmetric misclassification costs. Business applications and problems of medical diagnosis are prominent examples and pattern recognitions techniques are routinely used to support decision making within these fields. In particular, support vector machines (SVMs) have been successfully applied, e.g. to evaluate customer credit worthiness in credit scoring or detect tumorous cells in bio-molecular data analysis. However, ordinary SVMs minimize a continuous approximation for the classification error giving similar importance to each error type. While several modifications have been proposed to make SVMs cost-sensitive the impact of the approximate error measurement is normally not considered. Recently, Orsenigo and Vercellis introduced a discrete SVM (DSVM) formulation [1] that minimize mis-classification errors directly and overcomes possible limitations of an error proxy. For example, DSVM facilitates explicit cost minimization so that this technique is a promising candidate for cost-sensitive learning. Consequently, we compare DSVM with a standard procedure for cost-sensitive SVMs and investigate to what extent improvements in terms of mis-classification costs are achievable. While the standard SVM performs remarkably well DSVM is found to give yet superior results.

[1]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[2]  Carlo Vercellis,et al.  Discrete support vector decision trees via tabu search , 2004, Comput. Stat. Data Anal..

[3]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[4]  Sungzoon Cho,et al.  Response modeling with support vector machines , 2006, Expert Syst. Appl..

[5]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[6]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[7]  F. Glover,et al.  Candidate List and Exploration Strategies for Solving 0/1 Mip Problems Using a Pivot Neighborhood , 1999 .

[8]  Kristin P. Bennett,et al.  Hybrid extreme point tabu search , 1998, Eur. J. Oper. Res..

[9]  Guido Dedene,et al.  Cost-sensitive learning and decision making revisited , 2005, Eur. J. Oper. Res..

[10]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Fabio Roli,et al.  Cost-sensitive Learning in Support Vector Machines , 2002 .

[13]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[14]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[15]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[16]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[19]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[20]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[21]  Emilio Carrizosa,et al.  Two-group classification via a biobjective margin maximization model , 2006, Eur. J. Oper. Res..

[22]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[23]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[24]  Johan A. K. Suykens,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2004, Machine Learning.

[25]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[26]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[27]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[28]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[29]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[30]  L. Thomas A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers , 2000 .

[31]  Fred W. Glover,et al.  Solving zero-one mixed integer programming problems using tabu search , 1998, European Journal of Operational Research.

[32]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[33]  Kristin P. Bennett,et al.  On support vector decision trees for database marketing , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[34]  Ulf Brefeld,et al.  Perceptron and SVM learning with generalized cost models , 2004, Intell. Data Anal..

[35]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[36]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[37]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[38]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[39]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[40]  Fred W. Glover,et al.  Tabu search - wellsprings and challenges , 1998, Eur. J. Oper. Res..