Class discovery and classification of tumor samples using mixture modeling of gene expression data - a unified approach

MOTIVATION The DNA microarray technology has been increasingly used in cancer research. In the literature, discovery of putative classes and classification to known classes based on gene expression data have been largely treated as separate problems. This paper offers a unified approach to class discovery and classification, which we believe is more appropriate, and has greater applicability, in practical situations. RESULTS We model the gene expression profile of a tumor sample as from a finite mixture distribution, with each component characterizing the gene expression levels in a class. The proposed method was applied to a leukemia dataset, and good results are obtained. With appropriate choices of genes and preprocessing method, the number of leukemia types and subtypes is correctly inferred, and all the tumor samples are correctly classified into their respective type/subtype. Further evaluation of the method was carried out on other variants of the leukemia data and a colon dataset.

[1]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[2]  Sylvia Richardson,et al.  Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray Experiments , 2002, J. Comput. Biol..

[3]  Walter L. Ruzzo,et al.  Bayesian Classification of DNA Array Expression Data , 2000 .

[4]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[6]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[7]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[8]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[10]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[11]  Roxana Alexandridis,et al.  Classification of tissue samples using mixture modeling of microarray gene expression data , 2003 .

[12]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[13]  Jae K. Lee,et al.  Developing Optimal Prediction Models for Cancer Classification Using Gene Expression Data , 2004, J. Bioinform. Comput. Biol..