Imbalanced SVM Learning with Margin Compensation

The paper surveys the previous solutions and proposes further a new solution based on the cost-sensitive learning for solving the imbalanced dataset learning problem in the support vector machines. The general idea of cost-sensitive approach is to adopt an inverse proportional penalization scheme for dealing with the problem and forms a penalty regularized model. In the paper, additional margin compensation is further included to achieve a more accurate solution. As known, the margin plays an important role in drawing the decision boundary. It motivates the study to produce imbalanced margin between the classes which enables the decision boundary shift. The imbalanced margin is hence allowed to recompense the overwhelmed class as margin compensation. Incorporating with the penalty regularization, the margin compensation is capable to calibrate moderately the decision boundary and can be utilized to refine the bias boundary. The effect decreases the need of high penalty on the minority class and prevents the classification from the risk of overfitting. Experimental results show a promising potential in future applications.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[3]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[4]  C.J. Harris,et al.  Classification of unbalanced data with transparent kernels , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[5]  Giorgio Valentini,et al.  Support vector machines for candidate nodules classification , 2005, Neurocomputing.

[6]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[7]  Gilles Cohen,et al.  One-Class Support Vector Machines with a Conformal Kernel. A Case Study in Handling Class Imbalance , 2004, SSPR/SPR.

[8]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[9]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Edwin R. Hancock,et al.  Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop, SSPR&SPR 2010, Cesme, Izmir, Turkey, August 18-20, 2010. Proceedings , 2010, SSPR/SPR.

[14]  Pierre Dupont,et al.  F support vector machines , 2005, ISNN 2005.

[15]  Chan-Yun Yang Generalization Ability in SVM with Fuzzy Class Labels , 2006, 2006 International Conference on Computational Intelligence and Security.

[16]  P. Dupont,et al.  F/sub /spl beta// support vector machines , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[17]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[18]  Gerhard Widmer,et al.  Machine Learning: ECML-97 , 1997, Lecture Notes in Computer Science.

[19]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[20]  Sungzoon Cho,et al.  Response modeling with support vector machines , 2006, Expert Syst. Appl..

[21]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[22]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[23]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..