Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data

BackgroundUsing DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant).ResultsThe proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized.ConclusionsOur models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used.

[1]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[6]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[7]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Jan Komorowski,et al.  Predicting gene ontology biological process from temporal gene expression patterns. , 2003, Genome research.

[10]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[12]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[13]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[14]  Peter Bühlmann,et al.  Supervised clustering of genes , 2002, Genome Biology.

[15]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[16]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[17]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[18]  Junbai Wang,et al.  Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study , 2002, BMC Bioinformatics.

[19]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[20]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[21]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..