Improving SVM Classification with Imbalance Data Set

In view of inconsistent problems caused by that Synthetic Minority Over-sampling Technique (SMOTE) and Support Vector Machine (SVM) work in different space, this paper presents a kernel-based SMOTE approach to solve classification with imbalance data set by SVM. The method first preprocesses the data by oversampling the minority instances in the feature space, then the pre-images of the synthetic samples are found based on a distance relation between feature space and input space. Finally, these pre-images are appended to the original dataset to train a SVM. Experiments on real data set indicate that compared with SMOTE approach, the samples constructed by the proposed method have the higher quality. As a result, the effectiveness of classification by SVM on imbalance data set is improved.

[1]  J. Gower Adding a point to vector diagrams in multivariate analysis , 1968 .

[2]  Joarder Kamruzzaman,et al.  z-SVM: An SVM for Improved Classification of Imbalanced Data , 2006, Australian Conference on Artificial Intelligence.

[3]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[4]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Bo Zhang,et al.  Learning concepts from large scale imbalanced data sets using support cluster machines , 2006, MM '06.

[8]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[9]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[10]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Xiangji Huang,et al.  Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles , 2006, PAKDD.

[13]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[14]  Sungzoon Cho,et al.  EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems , 2006, ICONIP.

[15]  Wang Bao-xun A Classification Method for Imbalance Data Set Based on Hybrid Strategy , 2007 .

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Abdul Sattar,et al.  AI 2006: Advances in Artificial Intelligence, 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006, Proceedings , 2006, Australian Conference on Artificial Intelligence.