Global networks of functional coupling in eukaryotes from comprehensive data integration.

No single experimental method can discover all connections in the interactome. A computational approach can help by integrating data from multiple, often unrelated, proteomics and genomics pipelines. Reconstructing global networks of functional coupling (FC) faces the challenges of scale and heterogeneity--how to efficiently integrate huge amounts of diverse data from multiple organisms, yet ensuring high accuracy. We developed FunCoup, an optimized Bayesian framework, to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions: physical interaction, protein complex member, metabolic, or signaling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimize the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 data sets in seven organisms and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network, only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. We show how FunCoup can be used for discovering candidate members of the Parkinson and Alzheimer pathways. Cross-species pathway conservation analysis provided further support to these observations.

[1]  B. Tang,et al.  The Mammalian Protein (rbet1) Homologous to Yeast Bet1p Is Primarily Associated with the Pre-Golgi Intermediate Compartment and Is Involved in Vesicular Transport from the Endoplasmic Reticulum to the Golgi Apparatus , 1997, The Journal of cell biology.

[2]  Bono,et al.  Systematic Prediction of Orthologous Units of Genes in the Complete Genomes. , 1998, Genome informatics. Workshop on Genome Informatics.

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  M. Vidal,et al.  Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". , 2001, Genome research.

[6]  E. Koonin,et al.  PAMP and PARL, two novel putative metalloproteases interacting with the COOH-terminus of Presenilin-1 and -2. , 2001, Journal of Alzheimer's disease : JAD.

[7]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[8]  Y. Suh,et al.  Amyloid precursor protein, presenilins, and alpha-synuclein: molecular pathogenesis and pharmacological applications in Alzheimer's disease. , 2002, Pharmacological reviews.

[9]  B. V. Shyamala,et al.  A positive role for patched-smoothened signaling in promoting cell proliferation during normal head development in Drosophila. , 2002, Development.

[10]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[11]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[12]  K. Irvine,et al.  Glycosylation regulates Notch signalling , 2003, Nature Reviews Molecular Cell Biology.

[13]  A. Grigoriev On the number of protein-protein interactions in the yeast proteome. , 2003, Nucleic acids research.

[14]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[16]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[17]  Peter D. Karp,et al.  A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases , 2004, BMC Bioinformatics.

[18]  Natalie Wilson,et al.  Human Protein Reference Database , 2004, Nature Reviews Molecular Cell Biology.

[19]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[20]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[21]  Lucila Ohno-Machado,et al.  A greedy algorithm for supervised discretization , 2004, J. Biomed. Informatics.

[22]  Thomas Lengauer,et al.  Confirmation of human protein interaction data by human expression data , 2005, BMC Bioinformatics.

[23]  Erik LL Sonnhammer Genome informatics: taming the avalanche of genomic data , 2004, Genome Biology.

[24]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[25]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[26]  D. Fenyo,et al.  Phosphotyrosine Signaling Networks in Epidermal Growth Factor Receptor Overexpressing Squamous Carcinoma Cells*S , 2005, Molecular & Cellular Proteomics.

[27]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[28]  Megan F. Cole,et al.  Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells , 2005, Cell.

[29]  M. Huynen,et al.  Benchmarking ortholog identification methods using functional genomics data , 2006, Genome Biology.

[30]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[31]  Burkhard Rost,et al.  Protein–Protein Interactions More Conserved within Species than across Species , 2006, PLoS Comput. Biol..

[32]  K. S. Deshpande,et al.  Human protein reference database—2006 update , 2005, Nucleic Acids Res..

[33]  Weiwei Zhong,et al.  Genome-Wide Prediction of C. elegans Genetic Interactions , 2006, Science.

[34]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[35]  Curtis Huttenhower,et al.  Bayesian data integration: a functional perspective. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[36]  S. Lindquist,et al.  α-Synuclein Blocks ER-Golgi Traffic and Rab1 Rescues Neuron Loss in Parkinson's Models , 2006, Science.

[37]  Gang Liu,et al.  Automatic clustering of orthologs and inparalogs shared by multiple proteomes , 2006, ISMB.

[38]  Michael Levine,et al.  Regulatory Blueprint for a Chordate Embryo , 2006, Science.

[39]  Charles DeLisi,et al.  Comparative assessment of performance and genome dependence among phylogenetic profiling methods , 2006, BMC Bioinformatics.

[40]  Serafim Batzoglou,et al.  Integrated Protein Interaction Networks for 11 Microbes , 2006, RECOMB.

[41]  Trey Ideker,et al.  Integrating physical and genetic maps: from genomes to interaction networks , 2007, Nature Reviews Genetics.

[42]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[43]  J. Yates,et al.  Large-Scale Identification of c-MYC-Associated Proteins Using a Combined TAP/MudPIT Approach , 2007, Cell cycle.

[44]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[45]  Erik L. L. Sonnhammer,et al.  jSquid: a Java applet for graphical on-line network exploration , 2008, Bioinform..

[46]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[47]  Erik L. L. Sonnhammer,et al.  InParanoid 6: eukaryotic ortholog clusters with inparalogs , 2007, Nucleic Acids Res..

[48]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[49]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .