Protein interaction networks revealed by proteome coevolution

Predicting protein pairs Biological function is driven by interaction between proteins. High-throughput experimental techniques have provided large datasets of protein interactions in several organisms; however, much combinatorial space remains uncharted. Cong et al. predict protein interfaces by identifying coevolving residues in aligned protein sequences (see the Perspective by Vajda and Emili). In comparison with gold-standard and negative control sets, they show that the accuracy is higher than for proteome-wide two-hybrid and mass spectrometry screens. The approach predicts 1618 protein interactions in Escherichia coli, 682 of which were unanticipated, and 911 interacting pairs in Mycobacterium tuberculosis, most of which had not been previously described. With an expected false-positive rate of between 10 and 20%, the predicted interactions and networks provide an excellent starting point for further study. Science, this issue p. 185; see also p. 120 A computational approach reveals hundreds of protein-protein interactions in Escherichia coli and Mycobacterium tuberculosis. Residue-residue coevolution has been observed across a number of protein-protein interfaces, but the extent of residue coevolution between protein families on the whole-proteome scale has not been systematically studied. We investigate coevolution between 5.4 million pairs of proteins in Escherichia coli and between 3.9 millions pairs in Mycobacterium tuberculosis. We find strong coevolution for binary complexes involved in metabolism and weaker coevolution for larger complexes playing roles in genetic information processing. We take advantage of this coevolution, in combination with structure modeling, to predict protein-protein interactions (PPIs) with an accuracy that benchmark studies suggest is considerably higher than that of proteome-wide two-hybrid and mass spectrometry screens. We identify hundreds of previously uncharacterized PPIs in E. coli and M. tuberculosis that both add components to known protein complexes and networks and establish the existence of new ones.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  E. Katchalski‐Katzir,et al.  Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[3]  I. Vakser Protein docking for low-resolution structures. , 1995, Protein engineering.

[4]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[5]  P. Bork,et al.  Non-orthologous gene displacement. , 1996, Trends in genetics : TIG.

[6]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[7]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[8]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[9]  Olivier Ledoit,et al.  Honey, I Shrunk the Sample Covariance Matrix , 2003 .

[10]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[11]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[12]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[13]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[14]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[15]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[16]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[17]  Joel S. Bader,et al.  Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps , 2007, PLoS Comput. Biol..

[18]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[19]  R. Woodgate,et al.  The active form of DNA polymerase V is UmuD′2C–RecA–ATP , 2009, Nature.

[20]  A. Emili,et al.  Global Functional Atlas of Escherichia coli Encompassing Previously Uncharacterized Proteins , 2009, PLoS biology.

[21]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[22]  Yi Wang,et al.  Global protein-protein interaction network in the human pathogen Mycobacterium tuberculosis H37Rv. , 2010, Journal of proteome research.

[23]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[24]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[25]  Peter D. Karp,et al.  EcoCyc: a comprehensive database of Escherichia coli biology , 2010, Nucleic Acids Res..

[26]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[27]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[28]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[29]  Evan E. Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2011, Nucleic acids research.

[30]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[31]  Matteo Pellegrini,et al.  Using phylogenetic profiles to predict functional relationships. , 2012, Methods in molecular biology.

[32]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[33]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[34]  J. Fernández-Recio,et al.  Intermolecular Contact Potentials for Protein-Protein Interactions Extracted from Binding Free Energy Changes upon Mutation. , 2013, Journal of chemical theory and computation.

[35]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[36]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[37]  P. Uetz,et al.  The binary protein-protein interaction landscape of Escherichia coli , 2014, Nature Biotechnology.

[38]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[39]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[40]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[41]  David E. Kim,et al.  Large-scale determination of previously unsolved protein structures using evolutionary information , 2015, eLife.

[42]  Peter Uetz,et al.  Protein Complexes in Bacteria , 2015, PLoS Comput. Biol..

[43]  Cathy H. Wu,et al.  UniProt: the universal protein knowledgebase , 2016, Nucleic Acids Research.

[44]  Lucy J. Colwell,et al.  Inferring interaction partners from protein sequences , 2016, Proceedings of the National Academy of Sciences.

[45]  David E. Kim,et al.  Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. , 2016, Journal of chemical theory and computation.

[46]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[47]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[48]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[49]  Qing Wu,et al.  ComplexContact: a web server for inter-protein contact prediction using deep learning , 2018, Nucleic Acids Res..

[50]  Daniel J. Burnside,et al.  Global landscape of cell envelope protein complexes in Escherichia coli , 2017, Nature Biotechnology.

[51]  Behnam Neyshabur,et al.  Predicting protein‐protein interactions through sequence‐based deep learning , 2018, Bioinform..

[52]  Anne-Florence Bitbol,et al.  Inferring interaction partners from protein sequences using mutual information , 2018, bioRxiv.

[53]  Jian Yang,et al.  VFDB 2019: a comparative pathogenomic platform with an interactive web interface , 2018, Nucleic Acids Res..

[54]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..