The Study of Synthetic Minority Over-sampling Technique (SMOTE) and Weighted Extreme Learning Machine for Handling Imbalance Problem on Multiclass Microarray classification

Microarray data classification has a great challenge due to number of samples which is much smaller compared to the number of genes. The problem is getting harder when the dataset has multiclass target and the number of samples in each class is not well distributed (which is called imbalance data distribution). In this research, two different approaches to handle imbalance data distribution are studied, they are SMOTE (based on data approach) and weighted ELM (based on algorithmic approach). To evaluate the performance of the proposed method, two public imbalanced multiclass microarray dataset are used, GCM (Global Cancer Map) and Subtypes-Leukemia dataset. The results of experiment show that the implementation of SMOTE and weighted ELM on GCM dataset have no significant effect in the classification performance. Different with the Subtypes-Leukemia dataset, the implementation of SMOTE and weighted ELM has improved the classification performance compared to the previous research. Generally, the results show that weighted ELM perform slightly better compared to SMOTE to increase the accuracy of the minority class.

[1]  Rong Huang,et al.  Web spam classification method based on deep belief networks , 2018, Expert Syst. Appl..

[2]  Worachai Srimuang,et al.  Improving performance of classification intrusion detection model by Weighted extreme learning using behavior analysis of the attack , 2015, 2015 International Computer Science and Engineering Conference (ICSEC).

[3]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[4]  Fernando Bação,et al.  Oversampling for Imbalanced Learning Based on K-Means and SMOTE , 2017, Inf. Sci..

[5]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[6]  Hala M Alshamlan,et al.  DQB: A novel dynamic quantitive classification model using artificial bee colony algorithm with application on gene expression profiles , 2018, Saudi journal of biological sciences.

[7]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[9]  Dov Stekel,et al.  Microarray Bioinformatics: Appendix: MIAME Glossary , 2003 .

[10]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Min Han,et al.  Remote Sensing Image Transfer Classification Based on Weighted Extreme Learning Machine , 2016, IEEE Geoscience and Remote Sensing Letters.

[12]  S. P. Akarte,et al.  STUDY OF MULTICLASS CLASSIFICATION FOR IMBALANCED BIOMEDICAL DATA , 2014 .

[13]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[14]  Ivanna K. Timotius,et al.  Arithmetic means of accuracies: A classifier performance measurement for imbalanced data set , 2010, 2010 International Conference on Audio, Language and Image Processing.

[15]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[16]  Weidong Yang,et al.  Class-specific cost regulation extreme learning machine for imbalanced classification , 2017, Neurocomputing.

[17]  Wang Jiajun,et al.  Multiclass Microarray Data Classification Based on SA-ECOC , 2017, 2017 10th International Symposium on Computational Intelligence and Design (ISCID).

[18]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[19]  Rismiyati,et al.  Multiclass classification of cancer based on microarray data using extreme learning machine , 2017, 2017 1st International Conference on Informatics and Computational Sciences (ICICoS).