Asymmetric Kernel Scaling for Imbalanced Data Classification

Many critical application domains present issues related to imbalanced learning - classification from imbalanced data. Using conventional techniques produces biased results, as the over-represented class dominates the learning process and tend to naturally attract predictions. As a consequence, the false negative rate may result unacceptable and the chosen classifier unusable. We propose a classification procedure based on Support Vector Machine able to effectively cope with data imbalance. Using a first step approximate solution and then a suitable kernel transformation, we enlarge asymmetrically space around the class boundary, compensating data skewness. Results show that while in case of moderate imbalance the performances are comparable to standard SVM, in case of heavily skewed data the proposed approach outperforms its competitors.

[1]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Peter Williams,et al.  Scaling the Kernel Function to Improve Performance of the Support Vector Machine , 2005, ISNN.

[4]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[5]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Max A. Little,et al.  Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests , 2009 .

[8]  Antônio de Pádua Braga,et al.  An Improved Algorithm for SVMs Classification of Imbalanced Data Sets , 2009, EANN.

[9]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Zhang Yi,et al.  Advances in Neural Networks - ISNN 2005, Second International Symposium on Neural Networks, Chongqing, China, May 30 - June 1, 2005, Proceedings, Part II , 2005, ISNN.

[12]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[13]  Wenxin Hu,et al.  An Efficient Algorithm for Multi-class Support Vector Machines , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[14]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[15]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .