Ensemble classifiers for biomedical data: Performance evaluation

Machine Learning concept offers the biomedical research field a great support. It provides many opportunities for disease discovering and related drugs revealing. The machine learning medical applications had been evolved from the physician needs and motivated by the promising results extracted from empirical studies. Medical support systems can be provided by screening, medical images, pattern classification and microarrays gene expression analysis. Typically medical data is characterized by its huge dimensionality and relatively limited examples. Feature selection is a crucial step to improve classification performance. Recent studies in machine learning field about classification process emerged a novel strong classifier scheme called the ensemble classifier. In this paper, a study for the performance of two novel ensemble classifiers namely Random Forest (RF) and Rotation Forest (ROT) for biomedical data sets is tested with five medical datasets. Three different feature selection methods were used to extract the most relevant features in each data set. Prediction performance is evaluated using accuracy measure. It was observed that ROT achieved the highest classification accuracy in most tested cases.

[1]  Mohamed Amir Esseghir,et al.  Effective Wrapper-Filter hybridization through GRASP Schemata , 2010, FSDM.

[2]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[3]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[4]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Mohamed F. Tolba,et al.  An efficient solution for aligning huge DNA sequences , 2011, The 2011 International Conference on Computer Engineering & Systems.

[7]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[8]  Andrzej Skowron,et al.  Independent Component Analysis, Princpal Component Analysis and Rough Sets in Hybrid Mammogram Classification , 2006, IPCV.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Adel Al-Jumaily,et al.  Feature subset selection using differential evolution and a statistical repair mechanism , 2011, Expert Syst. Appl..

[11]  Tin Kam Ho,et al.  MULTIPLE CLASSIFIER COMBINATION: LESSONS AND NEXT STEPS , 2002 .

[12]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Mohamed F. Tolba,et al.  Integration of Neural Network Preprocessing Model for OMI Aerosol Optical Depth Data Assimilation , 2012, AMLTA.

[14]  Elias Zintzaras,et al.  Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data , 2010, Comput. Biol. Medicine.

[15]  Xue-wen Chen An improved branch and bound algorithm for feature selection , 2003, Pattern Recognit. Lett..

[16]  Leszek Koszalka,et al.  The Usage of the k-Nearest Neighbour Classifier with Classifier Ensemble , 2012, 2012 12th International Conference on Computational Science and Its Applications.

[17]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19]  Kemal Polat,et al.  A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems , 2009, Expert Syst. Appl..

[20]  Aboul Ella Hassanien,et al.  Hybrid system for lymphatic diseases diagnosis , 2013, 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[21]  Dimitrios I. Fotiadis,et al.  Automated Diagnosis of Diseases Based on Classification: Dynamic Determination of the Number of Trees in Random Forests Algorithm , 2012, IEEE Transactions on Information Technology in Biomedicine.

[22]  Gennaro Percannella,et al.  On the use of classification reliability for improving performance of the one-per-class decomposition method , 2009, Data Knowl. Eng..

[23]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[24]  Yumin Chen,et al.  A rough set approach to feature selection based on power set tree , 2011, Knowl. Based Syst..

[25]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[26]  Mário A. T. Figueiredo,et al.  Efficient feature selection filters for high-dimensional data , 2012, Pattern Recognit. Lett..

[27]  Juan José Rodríguez Diez,et al.  An Experimental Study on Rotation Forest Ensembles , 2007, MCS.

[28]  Aboul Ella Hassanien,et al.  Detection of Spiculated Masses in Mammograms Based on Fuzzy Image Processing , 2004, ICAISC.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  De-Shuang Huang,et al.  Cancer classification using Rotation Forest , 2008, Comput. Biol. Medicine.

[31]  Hanaa Ismail Elshazly,et al.  Rough sets and genetic algorithms: A hybrid approach to breast cancer classification , 2012, 2012 World Congress on Information and Communication Technologies.

[32]  Jon Atli Benediktsson,et al.  A novel supervised feature selection technique based on genetic algorithms , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[33]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .