Imbalanced data classification using complementary fuzzy support vector machine techniques and SMOTE

A hybrid sampling technique is proposed by combining Complementary Fuzzy Support Vector Machine (CMTFSVM) and Synthetic Minority Oversampling Technique (SMOTE) for handling the imbalanced classification problem. The proposed technique uses an optimised membership function to enhance the classification performance and it is compared with three different classifiers. The experiments consisted of four standard benchmark datasets and one real world data of plant cells. The results revealed that implementing CMTFSVM followed by SMOTE provided better result over other FSVM classifiers for the benchmark datasets. Furthermore, it presented the best result on real world dataset with 0.9589 of G-mean and 0.9598 of AUC. It can be concluded that the proposed techniques work well with imbalanced benchmark and real world data.

[1]  Marco Vannucci,et al.  A method for resampling imbalanced datasets in binary classification tasks for real-world problems , 2014, Neurocomputing.

[2]  Josef Kittler,et al.  Inverse random under sampling for class imbalance problem and its application to multi-label classification , 2012, Pattern Recognit..

[3]  Shahidan M. Abdullah,et al.  Advantage and drawback of support vector machine functionality , 2014, 2014 International Conference on Computer, Communications, and Control Technology (I4CT).

[4]  Shigeo Abe,et al.  Fuzzy support vector machines for multiclass problems , 2002, ESANN.

[5]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[6]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[7]  Lance Chun Che Fung,et al.  Data Cleaning Using Complementary Fuzzy Support Vector Machine Technique , 2016, ICONIP.

[8]  Sofia Visa,et al.  Fuzzy Classifiers for Imbalanced , Complex Classes of Varying Size , 2005 .

[9]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[10]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[11]  Zongyao He,et al.  A Fuzzy Support Vector Machine for Imbalanced Data Classification , 2010, 2010 International Conference on Optoelectronics and Image Processing.

[12]  Ajith Abraham,et al.  A Review of Class Imbalance Problem , 2014 .

[13]  P. Manikandan,et al.  IMBALANCED DATASET CLASSIFICATION AND SOLUTIONS : A REVIEW , 2014 .

[14]  Huaxiang Zhang,et al.  RWO-Sampling: A random walk over-sampling approach to imbalanced data classification , 2014, Inf. Fusion.

[15]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[16]  Lance Chun Che Fung,et al.  Data Cleaning for Classification Using Misclassification Analysis , 2010, J. Adv. Comput. Intell. Intell. Informatics.

[17]  Lance Chun Che Fung,et al.  Porosity Prediction Using Bagging of Complementary Neural Networks , 2009, ISNN.

[18]  A. Ralescu,et al.  Fuzzy classifiers versus cost-based Bayes classifiers , 2006, NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society.

[19]  Lance Chun Che Fung,et al.  Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm , 2010, ICONIP.

[20]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[21]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[22]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[23]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[24]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..