Comparison of population based metaheuristics for feature selection: Application to microarray data classification

In this work we compare the use of a particle swarm optimization (PSO) and a genetic algorithm (GA) (both augmented with support vector machines SVM) for the classification of high dimensional microarray data. Both algorithms are used for finding small samples of informative genes amongst thousands of them. A SVM classifier with 10-fold cross-validation is applied in order to validate and evaluate the provided solutions. A first contribution is to prove that PSOSVM is able to find interesting genes and to provide classification competitive performance. Specifically, a new version of PSO, called geometric PSO, is empirically evaluated for the first time in this work. In this sense, a comparison of this approach with a new GASVM and also with other existing methods of literature is provided. A second important contribution consists in the actual discovery of new and challenging results on six public datasets identifying significant in the development of a variety of cancers (leukemia, breast, colon, ovarian, prostate, and lung).

[1]  Ron Kohavi,et al.  The Wrapper Approach , 1998 .

[2]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[3]  El-Ghazali Talbi,et al.  ParadisEO: A Framework for Parallel and Distributed Metaheuristics , 2003 .

[4]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[5]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[6]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[7]  E. Talbi,et al.  A Genetic Algorithm for Feature Selection in Data-Mining for Genetics , 2001 .

[8]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[9]  Ed Keedwell,et al.  Two-Phase EA/k-NN for Feature Selection and Classification in Cancer Microarray Datasets , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[10]  Bþ KHI,et al.  Classification of Two-Class Cancer Data Reliably Using , .

[11]  Maurice Clerc Binary Particle Swarm Optimisers: toolbox, derivations, and mathematical insights , 2005 .

[12]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[13]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[14]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[15]  S. P. Fodor,et al.  Light-generated oligonucleotide arrays for rapid DNA sequence analysis. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[17]  R. Silver,et al.  Anagrelide is effective in treating patients with hydroxyurea-resistant thrombocytosis in patients with chronic myeloid leukemia , 2005, Leukemia.

[18]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[19]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[20]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[21]  Julian Togelius,et al.  Geometric particle swarm optimization , 2008 .

[22]  Wei Kong,et al.  A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. , 2007, Talanta.

[23]  Riccardo Poli,et al.  Geometric Particle Swarm Optimisation , 2007, EuroGP.

[24]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Debra L Barton,et al.  Venlafaxine in management of hot flashes in survivors of breast cancer: a randomised controlled trial , 2000, The Lancet.

[26]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[27]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[30]  Enrique Alba,et al.  MALLBA: A Library of Skeletons for Combinatorial Optimisation (Research Note) , 2002, Euro-Par.