A semi-parametric Bayesian model for unsupervised differential co-expression analysis

BackgroundDifferential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples.ResultsWe developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ERα regulatory network.ConclusionsWe demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.

[1]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[2]  Liang Chen,et al.  A statistical method for identifying differential gene-gene co-expression patterns , 2004, Bioinform..

[3]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[4]  Mario Medvedovic,et al.  Genomics Portals: integrative web-platform for mining genomics data , 2010, BMC Genomics.

[5]  Ju Han Kim,et al.  Identifying set-wise differential co-expression in gene expression microarray data , 2009, BMC Bioinformatics.

[6]  Benjamin Haibe-Kains,et al.  A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all? , 2008, Bioinform..

[7]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[8]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[9]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[10]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[11]  Michael Watson,et al.  CoXpress: differential co-expression in gene expression data , 2006, BMC Bioinformatics.

[12]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[14]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[15]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..

[16]  L. Holmberg,et al.  Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts , 2005, Breast Cancer Research.

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[19]  J. Haerting,et al.  Gene-expression signatures in breast cancer. , 2003, The New England journal of medicine.

[20]  John H. White,et al.  Mechanisms of primary and secondary estrogen target gene regulation in breast cancer cells , 2007, Nucleic acids research.

[21]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Christina Kendziorski,et al.  Statistical methods for gene set co-expression analysis , 2009, Bioinform..

[23]  Sangsoo Kim,et al.  Gene expression Differential coexpression analysis using microarray data and its application to human cancer , 2005 .

[24]  Rainer Spang,et al.  Finding disease specific alterations in the co-expression of genes , 2004, ISMB/ECCB.

[25]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[26]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[27]  Ka Yee Yeung,et al.  Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset , 2006, Bioinform..

[28]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[29]  I Kimber,et al.  Anti-proliferative effect of estrogen in breast cancer cells that re-express ERalpha is mediated by aberrant regulation of cell cycle genes. , 2005, Journal of molecular endocrinology.

[30]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[31]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[32]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[33]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Heather J. Ruskin,et al.  Techniques for clustering gene expression data , 2008, Comput. Biol. Medicine.

[35]  H. Kölbl,et al.  The humoral immune system has a key prognostic impact in node-negative breast cancer. , 2008, Cancer research.

[36]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[37]  Gianluca Bontempi,et al.  Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen , 2008, BMC Genomics.

[38]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[39]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[40]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..

[41]  Mario Medvedovic,et al.  Bayesian Model-Averaging in Unsupervised Learning From Microarray Data , 2004, BIOKDD.

[42]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[43]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[44]  Q. Wang,et al.  Clustering methods for microarray gene expression data. , 2006, Omics : a journal of integrative biology.

[45]  J. Mosley,et al.  Cell cycle correlated genes dictate the prognostic power of breast cancer gene lists , 2008, BMC Medical Genomics.

[46]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[47]  Mario Medvedovic,et al.  Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene expression and ChIP-chip data , 2007, BMC Bioinformatics.

[48]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[49]  H. Stunnenberg,et al.  Genomic actions of estrogen receptor alpha: what are the targets and how are they regulated? , 2009, Endocrine-related cancer.

[50]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[51]  Zhen Hu,et al.  BMC Bioinformatics BioMed Central Methodology article CLEAN: CLustering Enrichment ANalysis , 2009 .

[52]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[53]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[54]  Identifying statistically significant patterns of expression via Bayesian Infinite Mixture Models , 2000 .

[55]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[56]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[57]  Antonio Reverter,et al.  A Differential Wiring Analysis of Expression Data Correctly Identifies the Gene Containing the Causal Mutation , 2009, PLoS Comput. Biol..