Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset

MOTIVATION Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional 'noise' introduced by non-informative measurements. RESULTS We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters. AVAILABILITY The open-source package gimm is available at http://eh3.uc.edu/gimm.

[1]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[2]  Sandra Day O'Connor Vindicating the Rule of Law: The Role of the Judiciary , 2003 .

[3]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[4]  Ronald W. Davis,et al.  The core meiotic transcriptome in budding yeasts , 2000, Nature Genetics.

[5]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[6]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[7]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[9]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[10]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[11]  Mario Medvedovic,et al.  Bayesian Model-Averaging in Unsupervised Learning From Microarray Data , 2004, BIOKDD.

[12]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[13]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[14]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[15]  Kerby Shedden,et al.  Microarray analysis of gene expression during the cell cycle , 2003, Cell & chromosome.

[16]  S. Rafii,et al.  Splitting vessels: Keeping lymph apart from blood , 2003, Nature Medicine.

[17]  Nir Friedman,et al.  Context-specific Bayesian clustering for gene expression data , 2001, J. Comput. Biol..

[18]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[19]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[20]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[21]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .