A hybrid GA & back propagation approach for gene selection and classification of microarray data

We propose a Genetic Algorithm (GA) approach combined with Neural Network (MultiLayer Perceptron) with Back Propagation algorithm (BP) for the classification of high dimensional Microarray data. This approach is associated to a fuzzy logic based pre-filtering technique. The GA is used to evolve gene subsets whose fitness is evaluated by a NN classifier. Using archive records of "good" gene subsets, a frequency based technique is introduced to identify the most informative genes. Our approach is assessed on two well-known cancer datasets and shows competitive results with six existing methods.

[1]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  T. Ross Fuzzy Logic with Engineering Applications , 1994 .

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Fillia Makedon,et al.  HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data , 2005, Bioinform..

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  Martin Vingron,et al.  Support Vector Machines for Protein Fold Class Prediction , 2003 .

[9]  Liang Goh,et al.  A Hybrid Feature Selection Approach for Microarray Gene Expression Data , 2006, International Conference on Computational Science.

[10]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Fernando Pérez-Cruz,et al.  Enhancing genetic feature selection through restricted search and Walsh analysis , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[14]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[15]  Sung-Bae Cho,et al.  Prediction of colon cancer using an evolutionary neural network , 2004, Neurocomputing.

[16]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[17]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[18]  Hitoshi Iba,et al.  Selecting informative genes using a multiobjective evolutionary algorithm , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[19]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[20]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  S. Chao,et al.  FEATURE DIMENSION REDUCTION FOR MICROARRAY DATA ANALYSIS USING LOCALLY LINEAR EMBEDDING , 2005 .