Systematic computational prediction of protein interaction networks

Determining the network of physical protein associations is an important first step in developing mechanistic evidence for elucidating biological pathways. Despite rapid advances in the field of high throughput experiments to determine protein interactions, the majority of associations remain unknown. Here we describe computational methods for significantly expanding protein association networks. We describe methods for integrating multiple independent sources of evidence to obtain higher quality predictions and we compare the major publicly available resources available for experimentalists to use.

[1]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[2]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[3]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[4]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[5]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  H. Mori,et al.  Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. , 1999, Molecular biology and evolution.

[7]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[8]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  W. Eisenreich,et al.  Biosynthesis of terpenoids: YchB protein of Escherichia coli phosphorylates the 2-hydroxy group of 4-diphosphocytidyl-2C-methyl-D-erythritol. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[12]  M. Vidal,et al.  Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". , 2001, Genome research.

[13]  A. Grigoriev A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. , 2001, Nucleic acids research.

[14]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Alfonso Valencia,et al.  Extracting Information Automatically From Biological Literature , 2001, Comparative and functional genomics.

[16]  E V Koonin,et al.  Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. , 2001, Genome research.

[17]  Patrick Forterre,et al.  A hot story from comparative genomics: reverse gyrase is the only hyperthermophile-specific protein. , 2002, Trends in genetics : TIG.

[18]  Andrey Rzhetsky,et al.  Towards the Prediction of Complete Protein-Protein Interaction Networks , 2001, Pacific Symposium on Biocomputing.

[19]  Wan Kyu Kim,et al.  Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. , 2002, Genome informatics. International Conference on Genome Informatics.

[20]  B. Snel,et al.  Systematic discovery of analogous enzymes in thiamin biosynthesis , 2003, Nature Biotechnology.

[21]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[22]  D. Eisenberg,et al.  Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach , 2003, Genome Biology.

[23]  Friedrich Lottspeich,et al.  An exosome‐like complex in Sulfolobus solfataricus , 2003, EMBO reports.

[24]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[25]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[26]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[27]  Robert B. Russell,et al.  InterPreTS: protein Interaction Prediction through Tertiary Structure , 2003, Bioinform..

[28]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[29]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[30]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[31]  D. Eisenberg,et al.  Use of Logic Relationships to Decipher Protein Network Organization , 2004, Science.

[32]  B. Schölkopf,et al.  A Regularization Framework for Learning from Graph Data , 2004, ICML 2004.

[33]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[34]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[35]  Xiaofeng He,et al.  A unified representation of multiprotein complex data for modeling interaction networks , 2004, Proteins.

[36]  G. Kryukov,et al.  Identification and characterization of phosphoseryl-tRNA[Ser]Sec kinase. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Mark Gerstein,et al.  Information assessment on predicting protein-protein interactions , 2004, BMC Bioinformatics.

[38]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[39]  Jonathan Lim,et al.  Ulysses - an application for the projection of molecular interactions across species , 2005, Genome Biology.

[40]  L. Hood,et al.  A data integration methodology for systems biology: experimental verification. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[42]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[43]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[44]  Hamid Bolouri,et al.  A data integration methodology for systems biology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[45]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[46]  T. Sittler,et al.  The Plasmodium protein network diverges from those of other eukaryotes , 2005, Nature.

[47]  M. Sternberg,et al.  Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. , 2005, Journal of molecular biology.

[48]  Hui Lu,et al.  Correlation between gene expression profiles and protein-protein interactions within and across genomes , 2005, Bioinform..

[49]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[50]  Robert Hoffmann,et al.  HomoMINT: an inferred human network based on orthology mapping of protein interactions discovered in model organisms , 2005, BMC Bioinformatics.

[51]  Burkhard Rost,et al.  Protein–Protein Interactions More Conserved within Species than across Species , 2006, PLoS Comput. Biol..

[52]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[53]  Dong Dong,et al.  IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model , 2006, BMC Bioinformatics.

[54]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[55]  Cheng-Yan Kao,et al.  Reconstruction of human protein interolog network using evolutionary conserved network , 2007, BMC Bioinformatics.

[56]  I. Jurisica,et al.  Unequal evolutionary conservation of human protein interactions in interologous networks , 2007, Genome Biology.

[57]  Alfonso Valencia,et al.  TSEMA: interactive prediction of protein pairings between interacting families , 2006, Nucleic Acids Res..

[58]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[59]  C. Lim,et al.  Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites , 2007, BMC Bioinformatics.

[60]  Andrzej Kloczkowski,et al.  Functional clustering of yeast proteins from the protein-protein interaction network , 2006, BMC Bioinformatics.

[61]  Wei Keat Lim,et al.  A Context-Specific Network of Protein-DNA and Protein-Protein Interactions Reveals New Regulatory Motifs in Human B Cells , 2006, Systems Biology and Computational Proteomics.

[62]  Zhirong Sun,et al.  Inferring functional linkages between proteins from evolutionary scenarios. , 2006, Journal of molecular biology.

[63]  D. Bu,et al.  Genome-wide analysis of mammalian DNA segment fusion/fission. , 2006, Journal of theoretical biology.

[64]  Hanah Margalit,et al.  Characterization and prediction of protein–protein interactions within and between complexes , 2006, Proceedings of the National Academy of Sciences.

[65]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[66]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[67]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[68]  Zohar Itzhaki,et al.  Evolutionary conservation of domain-domain interactions , 2006, Genome Biology.

[69]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[70]  François Fouss,et al.  An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task , 2006, Sixth International Conference on Data Mining (ICDM'06).

[71]  Sailu Yellaboina,et al.  Inferring genome-wide functional linkages in E. coli by combining improved genome context methods: comparison with high-throughput experimental data. , 2007, Genome research.

[72]  Why Most Published Research Findings Are False: Author's Reply to Goodman and Greenland , 2007, PLoS medicine.

[73]  Pascal Kahlem,et al.  ENFIN—a Network to Enhance Integrative Systems Biology , 2007, Annals of the New York Academy of Sciences.

[74]  Geoffrey J. Barton,et al.  Probabilistic prediction and ranking of human protein-protein interactions , 2007, BMC Bioinformatics.

[75]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[76]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[77]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[78]  Ziv Bar-Joseph,et al.  A mixture of feature experts approach for protein-protein interaction prediction , 2007, BMC Bioinformatics.

[79]  Christine A. Orengo,et al.  Predicting Protein Function with Hierarchical Phylogenetic Profiles: The Gene3D Phylo-Tuner Method Applied to Eukaryotic Genomes , 2007, PLoS Comput. Biol..

[80]  Chuan Wang,et al.  InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes , 2007, BMC Bioinformatics.

[81]  Darren J. Wilkinson,et al.  Bayesian methods in bioinformatics and computational systems biology , 2006, Briefings Bioinform..

[82]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[83]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[84]  Octave Noubibou Doudieu,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[85]  Alfonso Valencia,et al.  Co‐evolution and co‐adaptation in protein networks , 2008, FEBS letters.

[86]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[87]  William Stafford Noble,et al.  Predicting Co-Complexed Protein Pairs from Heterogeneous Data , 2008, PLoS Comput. Biol..

[88]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[89]  R. Russell,et al.  Targeting and tinkering with interaction networks. , 2008, Nature chemical biology.

[90]  Huiru Zheng,et al.  GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction , 2008, Source Code for Biology and Medicine.

[91]  Ozlem Keskin,et al.  Architectures and functional coverage of protein-protein interfaces. , 2008, Journal of molecular biology.

[92]  Shoshana J. Wodak,et al.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs , 2009, BMC Bioinformatics.

[93]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[94]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[95]  Ozlem Keskin,et al.  A survey of available tools and web servers for analysis of protein-protein interactions and interfaces , 2008, Briefings Bioinform..

[96]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[97]  Ariel S. Schwartz,et al.  Cost effective strategies for completing the Interactome , 2008, Nature Methods.

[98]  Erik L. L. Sonnhammer,et al.  Comparative analysis and unification of domain-domain interaction networks , 2009, Bioinform..

[99]  Benjamin A. Shoemaker,et al.  Correlated evolution of interacting proteins: looking behind the mirrortree. , 2009, Journal of molecular biology.

[100]  David G. Rand,et al.  Decision-Making in Research Tasks with Sequential Testing , 2009, PloS one.

[101]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[102]  Alejandro Panjkovich,et al.  3did Update: domain–domain and peptide-mediated interactions of known 3D structure , 2008, Nucleic Acids Res..

[103]  Natalia N. Ivanova,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[104]  Tamás Nepusz,et al.  SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale , 2010, BMC Bioinformatics.

[105]  R. Kolde,et al.  Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods , 2009, Genome Biology.

[106]  Pierre Geurts,et al.  Supervised learning with decision tree-based methods in computational and systems biology. , 2009, Molecular bioSystems.

[107]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[108]  Gipsi Lima-Mendez,et al.  The powerful law of the power law and other myths in network biology. , 2009, Molecular bioSystems.

[109]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[110]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[111]  William Stafford Noble,et al.  Large-scale prediction of protein-protein interactions from structures , 2010, BMC Bioinformatics.

[112]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[113]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[114]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[115]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[116]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..

[117]  Guillaume J. Filion,et al.  Bayesian network analysis of targeting interactions in chromatin. , 2010, Genome research.

[118]  R. Durbin,et al.  Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes , 2010, Nature.

[119]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[120]  Bonnie Berger,et al.  Struct2Net: a web service to predict protein–protein interactions using a structure-based approach , 2010, Nucleic Acids Res..

[121]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[122]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[123]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[124]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[125]  Jagdish Chandra Patra,et al.  Integration of multiple data sources to prioritize candidate genes using discounted rating system , 2010, BMC Bioinformatics.

[126]  Christine A. Orengo,et al.  Finding the “Dark Matter” in Human and Yeast Protein Network Prediction and Modelling , 2010, PLoS Comput. Biol..

[127]  Peter D. Karp,et al.  A systematic study of genome context methods: calibration, normalization and combination , 2010, BMC Bioinformatics.

[128]  Christine A. Orengo,et al.  Gene3D: merging structure and function for a Thousand genomes , 2009, Nucleic Acids Res..

[129]  Dmitrij Frishman,et al.  The Negatome database: a reference set of non-interacting protein pairs , 2009, Nucleic Acids Res..

[130]  H. Parkinson,et al.  A global map of human gene expression , 2010, Nature Biotechnology.

[131]  Jaime G. Carbonell,et al.  Active learning for human protein-protein interaction prediction , 2010, BMC Bioinformatics.

[132]  R. Durbin,et al.  Systematic Analysis of Human Protein Complexes Identifies Chromosome Segregation Proteins , 2010, Science.

[133]  Andrew B. Clegg,et al.  CODA: Accurate Detection of Functional Associations between Proteins in Eukaryotic Genomes Using Domain Fusion , 2010, PloS one.

[134]  Mariano J. Alvarez,et al.  A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers , 2010, Molecular systems biology.

[135]  Raquel Norel,et al.  Protein interface conservation across structure space , 2010, Proceedings of the National Academy of Sciences.

[136]  Benjamin A. Shoemaker,et al.  Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites , 2009, Nucleic Acids Res..

[137]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[138]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[139]  J. Helmann,et al.  Biosynthesis and functions of bacillithiol, a major low-molecular-weight thiol in Bacilli , 2010, Proceedings of the National Academy of Sciences.

[140]  Hiroyuki Kurata,et al.  Diffusion Model Based Spectral Clustering for Protein-Protein Interaction Networks , 2010, PloS one.

[141]  Lin Gao,et al.  Spectral clustering for detecting protein complexes in protein-protein interaction (PPI) networks , 2010, Math. Comput. Model..

[142]  Bonnie Berger,et al.  iWRAP: An interface threading approach with application to prediction of cancer-related protein-protein interactions. , 2010, Journal of molecular biology.

[143]  Xue-wen Chen,et al.  KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions , 2010, Nucleic Acids Res..

[144]  Peter D'Eustachio,et al.  Reactome knowledgebase of human biological pathways and processes. , 2011, Methods in molecular biology.

[145]  Dmitrij Frishman,et al.  DIMA 3.0: Domain Interaction Map , 2011, Nucleic Acids Res..