A platform for the selection of genes in DNA microarraydata using evolutionary algorithms

This paper presents a flexible framework to the task of featureselection in classification of DNA microarray data. Theuser can select a number of filter methods in the preprocessingstage and choose from a wide set of classifiers (models and algorithms from WEKA [17] are available) and accuracy estimation methods. This approach implements wrapper methods, where Evolutionary Algorithms, with variable sized set based representations are used to reduce the number of attributes. Two case studies were used to validate the approach, with three distinct classifiers (1-nearest neighbour, decision trees, SVMs), a filter method based on discriminant fuzzy patterns and k-fold cross-validation to estimate the generalization error.

[1]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[2]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[3]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[4]  Ed Keedwell,et al.  Genetic Algorithms for Gene Expression Analysis , 2003, EvoWorkshops.

[5]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[6]  Nir Friedman,et al.  Scoring Genes for Relevance , 2000 .

[7]  Juan M. Corchado,et al.  gene‐CBR: A CASE‐BASED REASONIG TOOL FOR CANCER DIAGNOSIS USING MICROARRAY DATA SETS , 2006, Comput. Intell..

[8]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[11]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[12]  R. Verhaak,et al.  Prognostically useful gene-expression profiles in acute myeloid leukemia. , 2004, The New England journal of medicine.

[13]  A. Levine,et al.  Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. , 2001, Combinatorial chemistry & high throughput screening.

[14]  Hairong Qi Feature Selection and kNN Fusion in Molecular Classification of Multiple Tumor Types , .

[15]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[16]  Blaise Hanczar,et al.  Improving classification of microarray data using prototype-based feature selection , 2003, SKDD.

[17]  Bernhard Schölkopf,et al.  Feature selection for support vector machines by means of genetic algorithm , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[18]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.