Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm

In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data. In order to demonstrate that the proposed technique can assist classification of imbalanced data, several classification algorithms have been used. They are Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). The benchmark data sets with various ratios between the minority class and the majority class are obtained from the University of California Irvine (UCI) machine learning repository. The results show that the proposed combination techniques can improve the performance for the class imbalance problem.

[1]  Lance Chun Che Fung,et al.  Binary classification using ensemble neural networks and interval neutrosophic sets , 2009, Neurocomputing.

[2]  Lance Chun Che Fung,et al.  Porosity Prediction Using Bagging of Complementary Neural Networks , 2009, ISNN.

[3]  Tamás D. Gedeon,et al.  Balancing Bias and Variance: Network Topology and Pattern Set Reduction Techniques , 1995, IWANN.

[4]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[5]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[6]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[7]  Lance Chun Che Fung,et al.  Data Cleaning for Classification Using Misclassification Analysis , 2010, J. Adv. Comput. Intell. Intell. Informatics.

[8]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[9]  Francisco Sandoval,et al.  From Natural to Artificial Neural Computation , 1995 .

[10]  Li Zhu,et al.  Data Mining on Imbalanced Data Sets , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[11]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  Haibo He,et al.  Advances in Neural Networks – ISNN 2009 , 2009, Lecture Notes in Computer Science.

[14]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.