Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data.

BACKGROUND DNA microarray gene expression classification poses a challenging task to the machine learning domain. Typically, the dimensionality of gene expression data sets could go from several thousands to over 10,000 genes. A potential solution to this issue is using feature selection to reduce the dimensionality. AIMS The aim of this paper is to investigate how we can use feature quality information to improve the precision of microarray gene expression classification tasks. METHOD We propose two evolutionary machine learning models based on the eXtended Classifier System (XCS) and a typical feature selection methodology. The first one, which we call FS-XCS, uses feature selection for feature reduction purposes. The second model is GRD-XCS, which uses feature ranking to bias the rule discovery process of XCS. RESULTS The results indicate that the use of feature selection/ranking methods is essential for tackling highdimensional classification tasks, such as microarray gene expression classification. However, the results also suggest that using feature ranking to bias the rule discovery process performs significantly better than using the feature reduction method. In other words, using feature quality information to develop a smarter learning procedure is more efficient than reducing the feature set. CONCLUSION Our findings have shown that extracting feature quality information can assist the learning process and improve classification accuracy. On the other hand, relying exclusively on the feature quality information might potentially decrease the classification performance (e.g., using feature reduction). Therefore, we recommend a hybrid approach that uses feature quality information to direct the learning process by highlighting the more informative features, but at the same time not restricting the learning process to explore other features.

[1]  Pier Luca Lanzi,et al.  A Study of the Generalization Capabilities of XCS , 1997, ICGA.

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[6]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[7]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[8]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[9]  Martin V. Butz,et al.  Analysis and Improvement of Fitness Exploitation in XCS: Bounding Models, Tournament Selection, and Bilateral Accuracy , 2003, Evolutionary Computation.

[10]  Jason H. Moore,et al.  Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis , 2006, PPSN.

[11]  Martin V. Butz,et al.  Automated Global Structure Extraction for Effective Local Building Block Processing in XCS , 2006, Evolutionary Computation.

[12]  Jagath C. Rajapakse,et al.  Machine Learning in Bioinformatics , 2008 .

[13]  Pier Luca Lanzi,et al.  Learning classifier systems: then and now , 2008, Evol. Intell..

[14]  James Bailey,et al.  ROC-tree: A Novel Decision Tree Induction Algorithm Based on Receiver Operating Characteristics to Classify Gene Expression Data , 2008, SDM.

[15]  Ester Bernadó-Mansilla,et al.  New Crossover Operator for Evolutionary Rule Discovery in XCS , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[16]  James Bailey,et al.  Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics , 2008, ECML/PKDD.

[17]  Ke Wang,et al.  Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30 - May 2, 2009, Sparks, Nevada, USA , 2009, SDM.

[18]  James Bailey,et al.  Feature Weighted SVMs Using Receiver Operating Characteristics , 2009, SDM.

[19]  Raymond Chiong,et al.  Nature-Inspired Algorithms for Optimisation , 2009, Nature-Inspired Algorithms for Optimisation.

[20]  Ester Bernadó-Mansilla,et al.  Analysis and improvement of the genetic discovery component of XCS , 2009, Int. J. Hybrid Intell. Syst..

[21]  Jose Crispin Hernandez Hernandez,et al.  A New Combined Filter-Wrapper Framework for Gene Subset Selection with Specialized Genetic Operators , 2010, MCPR.

[22]  Rajkumar Buyya,et al.  Gene Expression Classification with a Novel Coevolutionary Based Learning Classifier System on Public Clouds , 2010, 2010 Sixth IEEE International Conference on e-Science Workshops.

[23]  Michael Kirley,et al.  Guided Rule Discovery in XCS for High-Dimensional Classification Problems , 2011, Australasian Conference on Artificial Intelligence.

[24]  Raymond Chiong,et al.  Novel evolutionary algorithms for supervised classification problems: an experimental study , 2011, Evol. Intell..

[25]  Zbigniew Michalewicz,et al.  Variants of Evolutionary Algorithms for Real-World Applications , 2011, Variants of Evolutionary Algorithms for Real-World Applications.