Informative Gene Discovery for Cancer Classification from Microarray Expression Data

Gene expression data analysis from microarray is a new advance of cancer diagnosis. However, the gene expression data often have high dimensionality and small sample size. These properties cause severe difficulties in classification. Gene selection is thus a crucial pre-processing step to filter out uninformative genes prior to the classification step. Our approach to perform gene selection is an information theoretic approach combining with sequential forward floating search. Experimental results show that our method is capable of efficiently finding a compact set of informative genes which can effectively discriminate different classes

[1]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[4]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[5]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Shunren Xia,et al.  Improving performance of gene selection by unsupervised learning , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[11]  Iñaki Inza,et al.  Gene selection by sequential search wrapper approaches in microarray cancer class prediction , 2002, J. Intell. Fuzzy Syst..

[12]  P. Hewett,et al.  Receptor protein tyrosine kinase EphB4 is up-regulated in colon cancer , 2001, BMC Molecular Biology.

[13]  Liang Goh,et al.  A Hybrid Feature Selection Approach for Microarray Gene Expression Data , 2006, International Conference on Computational Science.

[14]  Peter J. Park,et al.  A Nonparametric Scoring Algorithm for Identifying Informative Genes from Microarray Data , 2000, Pacific Symposium on Biocomputing.

[15]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[18]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[19]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[20]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[21]  M. Willingham,et al.  Loss of expression of tropomyosin-1, a novel class II tumor suppressor that induces anoikis, in primary breast tumors , 2003, Oncogene.

[22]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[23]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[24]  Stefano Landi,et al.  Association of common polymorphisms in inflammatory genes interleukin (IL)6, IL8, tumor necrosis factor alpha, NFKB1, and peroxisome proliferator-activated receptor gamma with colorectal cancer. , 2003, Cancer research.