M-BISON: Microarray-based integration of data sources using networks

BackgroundThe accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes.ResultsM-BISON improves signal detection on a range of simulated data, particularly when using very noisy microarray data. We also applied the method to the task of predicting heat shock-related differentially expressed genes in S. cerevisiae, using an hsf1 mutant microarray dataset and conserved yeast DNA sequence motifs. Our results demonstrate that M-BISON improves the analysis quality and makes predictions that are easy to interpret in concert with incorporated knowledge. Specifically, M-BISON increases the AUC of DE gene prediction from .541 to .623 when compared to a method using only microarray data, and M-BISON outperforms a related method, GeneRank. Furthermore, by analyzing M-BISON predictions in the context of the background knowledge, we identified YHR124W as a potentially novel player in the yeast heat shock response.ConclusionThis work provides a solid foundation for the principled integration of imperfect biological knowledge with gene expression data and other high-throughput data sources.

[1]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[2]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[3]  William Stafford Noble,et al.  The effect of replication on gene expression microarray experiments , 2003, Bioinform..

[4]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[5]  Z. Szallasi,et al.  Reliability and reproducibility issues in DNA microarray measurements. , 2006, Trends in genetics : TIG.

[6]  D. Cavalieri,et al.  Fundamentals of cDNA microarray data analysis. , 2003, Trends in genetics : TIG.

[7]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[8]  Ryszard Maleszka,et al.  Microarray reality checks in the context of a complex disease , 2004, Nature Biotechnology.

[9]  Rainer Breitling,et al.  Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments , 2004, BMC Bioinformatics.

[10]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[11]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[13]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[14]  T. Fukasawa,et al.  A novel domain of the yeast heat shock factor that regulates its activation function. , 2001, Biochemical and biophysical research communications.

[15]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[16]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[17]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[18]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[19]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[20]  H. Sakurai,et al.  Mutated Yeast Heat Shock Transcription Factor Activates Transcription Independently of Hyperphosphorylation* , 2006, Journal of Biological Chemistry.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  Peng Xiao,et al.  Hotelling’s T 2 multivariate profiling for detecting differential expression in microarrays , 2005 .

[23]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[24]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[25]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  S G Oliver,et al.  Transcriptome profiling of a Saccharomyces cerevisiae mutant with a constitutively activated Ras/cAMP pathway. , 2003, Physiological genomics.

[27]  Dong Xu,et al.  Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. , 2004, Nucleic acids research.

[28]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Carl Wu,et al.  Heat shock transcription factors: structure and regulation. , 1995, Annual review of cell and developmental biology.

[30]  Joaquín Dopazo,et al.  Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information , 2005, Bioinform..

[31]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[32]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[33]  T. Fukasawa,et al.  Carboxy‐terminal region of the yeast heat shock factor contains two domains that make transcription independent of the TFIIH protein kinase , 2003, Genes to cells : devoted to molecular & cellular mechanisms.

[34]  M. Kanehisa A database for post-genome analysis. , 1997, Trends in genetics : TIG.

[35]  V. Iyer,et al.  Genome-Wide Analysis of the Biology of Stress Responses through Heat Shock Transcription Factor , 2004, Molecular and Cellular Biology.