POE: Statistical Methods for Qualitative Analysis of Gene Expression

In many gene expression studies, the goals include discovery of novel biological classes and identification of genes whose expression can reliably be associated with these classes. Here we present a statistical analysis approach to facilitate both of these goals. The key idea is to model gene expression using latent categories that can be interpreted as a gene being turned “on“ or “off“ compared to a baseline level of expression. This three-way categorization is used for defining a reference in the unsupervised setting, for removing noise prior to clustering, for defining molecular subclasses in a way that is portable across platforms, and for defining easily interpretable probability-based distance measures for visualization, mining, and clustering.

[1]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[3]  D. Botstein,et al.  Diversity of gene expression in adenocarcinoma of the lung , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[5]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[6]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[7]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[8]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[9]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[10]  Mia Hubert,et al.  Clustering in an object-oriented environment , 1997 .

[11]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[12]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Clyde,et al.  Multiple shrinkage and subset selection in wavelets , 1998 .

[15]  G. Parmigiani,et al.  A statistical framework for expression‐based molecular classification in cancer , 2002 .

[16]  W. Hemmer,et al.  Creatine kinase in non-muscle tissues and cells , 1994, Molecular and Cellular Biochemistry.

[17]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[18]  M. Kattan,et al.  Elevated expression of caveolin is associated with prostate and breast cancer. , 1998, Clinical cancer research : an official journal of the American Association for Cancer Research.

[19]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[20]  E. George Minimax Multiple Shrinkage Estimation , 1986 .

[21]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Michael A. West,et al.  Deconvolution of Mixtures in Analysis of Neural Synaptic Transmission , 1994 .

[23]  S. Okushiba,et al.  Overexpression of caveolin‐1 in esophageal squamous cell carcinoma correlates with lymph node metastasis and pathologic stage , 2002, Cancer.

[24]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.