Context-sensitive data integration and prediction of biological networks

Motivation: Several recent methods have addressed the problem of heterogeneous data integration and network prediction by modeling the noise inherent in high-throughput genomic datasets, which can dramatically improve specificity and sensitivity and allow the robust integration of datasets with heterogeneous properties. However, experimental technologies capture different biological processes with varying degrees of success, and thus, each source of genomic data can vary in relevance depending on the biological process one is interested in predicting. Accounting for this variation can significantly improve network prediction, but to our knowledge, no previous approaches have explicitly leveraged this critical information about biological context. Results: We confirm the presence of context-dependent variation in functional genomic data and propose a Bayesian approach for context-sensitive integration and query-based recovery of biological process-specific networks. By applying this method to Saccharomyces cerevisiae, we demonstrate that leveraging contextual information can significantly improve the precision of network predictions, including assignment for uncharacterized genes. We expect that this general context-sensitive approach can be applied to other organisms and prediction scenarios. Availability: A software implementation of our approach is available on request from the authors.

[1]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[2]  R. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006 .

[3]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[4]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[5]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[6]  Ambuj K. Singh,et al.  Analysis of protein-protein interaction networks using random walks , 2005, BIOKDD.

[7]  Nir Friedman,et al.  Towards an Integrated Protein-Protein Interaction Network , 2005, RECOMB.

[8]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[9]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[10]  Dietmar E. Martin,et al.  Rank Difference Analysis of Microarrays (RDAM), a novel approach to statistical analysis of microarray expression profiling data , 2004, BMC Bioinformatics.

[11]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[12]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[13]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[14]  M. Gerstein,et al.  Embryonic stem cell grafting in normal and infarcted myocardium: serial assessment with MR imaging and PET dual detection. , 2009, Radiology.

[15]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[16]  Joel S. Bader,et al.  Greedily building protein networks with confidence , 2003, Bioinform..

[17]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[18]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[20]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[21]  S. Fields,et al.  A protein interaction map for cell polarity development , 2001, The Journal of cell biology.

[22]  Varshal K. Davé,et al.  Genome-wide responses to mitochondrial dysfunction. , 2001, Molecular biology of the cell.

[23]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[24]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[25]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[26]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[27]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[28]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[29]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[30]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.