Diverse accurate feature selection for microarray cancer diagnosis

Gene expression microarray data provides simultaneous activity measurement of thousands of features facilitating a potential effective and reliable cancer diagnosis. An important and challenging task in microarray analysis refers to selecting the most relevant and significant genes for data cancer classification. A random subspace ensemble based method is proposed to address feature selection in gene expression cancer diagnosis. The introduced Diverse Accurate Feature Selection method relies on multiple individual classifiers built based on random feature subspaces. Each feature is assigned a score computed based on the pairwise diversity among individual classifiers and the ratio between individual and ensemble accuracies. This triggers the creation of a ranked list of features for which a final classifier is applied with an increased performance using minimum possible number of genes. Experimental results focus on the problem of gene expression cancer diagnosis based on microarray datasets publicly available. Numerical results show that the proposed method is competitive with related models from literature.

[1]  Pradipta Maji,et al.  Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data , 2011, Int. J. Approx. Reason..

[2]  Xin Zhou,et al.  MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data , 2007, Bioinform..

[3]  Giorgio Valentini,et al.  Cancer recognition with bagged ensembles of support vector machines , 2004, Neurocomputing.

[4]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Marcel J. T. Reinders,et al.  Random subspace method for multivariate feature selection , 2006, Pattern Recognit. Lett..

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Yungho Leu,et al.  A novel hybrid feature selection method for microarray data analysis , 2011, Appl. Soft Comput..

[8]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[9]  Juan José Rodríguez Diez,et al.  Random Subspace Ensembles for fMRI Classification , 2010, IEEE Transactions on Medical Imaging.

[10]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[11]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[12]  M. Daumer,et al.  Evaluating Microarray-based Classifiers: An Overview , 2008, Cancer informatics.

[13]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Hui-Ling Huang,et al.  ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data , 2007, Biosyst..

[16]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[17]  James J. Chen,et al.  Ensemble methods for classification of patients for personalized medicine with high-dimensional data , 2007, Artif. Intell. Medicine.

[18]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..

[19]  Marcel J. T. Reinders,et al.  A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets , 2006, BMC Bioinformatics.

[20]  Jinn-Yi Yeh,et al.  Applying Data Mining Techniques for Cancer Classification from Gene Expression Data , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[21]  Giorgio Valentini,et al.  Bio-molecular cancer prediction with random subspace ensembles of support vector machines , 2005, Neurocomputing.

[22]  Pedro Gómez Vilda,et al.  Independent component analysis algorithms for microarray data analysis , 2010, Intell. Data Anal..

[23]  Sushmita Mitra,et al.  Evolutionary Rough Feature Selection in Gene Expression Data , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[25]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[26]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[27]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[28]  Lawrence O. Hall,et al.  Multivariate Feature Selection using Random Subspace Classifiers for Gene Expression Data , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[29]  Giorgio Valentini,et al.  Feature Selection Combined with Random Subspace Ensemble for Gene Expression Based Diagnosis of Malignancies , 2004, WIRN.