Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information

Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts across interfaces and assemble models of biological complexes. We find that residue pairs identified using a pseudo-likelihood-based method to covary across protein–protein interfaces in the 50S ribosomal unit and 28 additional bacterial protein complexes with known structure are almost always in contact in the complex, provided that the number of aligned sequences is greater than the average length of the two proteins. We use this method to make subunit contact predictions for an additional 36 protein complexes with unknown structures, and present models based on these predictions for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter. DOI: http://dx.doi.org/10.7554/eLife.02030.001

[1]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[2]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[3]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[4]  M. Sternberg,et al.  An analysis of conformational changes on protein-protein association: implications for predictive docking. , 1999, Protein engineering.

[5]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[6]  Ruth Nussinov,et al.  Efficient Unbound Docking of Rigid Molecules , 2002, WABI.

[7]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[8]  W. Kabsch,et al.  X-ray Structure of Pyruvate Formate-Lyase in Complex with Pyruvate and CoA , 2002, The Journal of Biological Chemistry.

[9]  Jeffrey J. Gray,et al.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[10]  Jean-Michel Claverie,et al.  FusionDB: a database for in-depth analysis of prokaryotic gene fusion events , 2004, Nucleic Acids Res..

[11]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[12]  F. Jacob,et al.  L'opéron : groupe de gènes à expression coordonnée par un opérateur [C. R. Acad. Sci. Paris 250 (1960) 1727–1729] , 2005 .

[13]  Hiroyoshi Matsumura,et al.  The Crystal Structure of the Outer Membrane Protein VceC from the Bacterial Pathogen Vibrio cholerae at 1.8 Å Resolution* , 2005, Journal of Biological Chemistry.

[14]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[15]  H. Wolfson,et al.  Correlated mutations: Advances and limitations. A study on fusion proteins and on the Cohesin‐Dockerin families , 2006, Proteins.

[16]  D. Baker,et al.  Toward high-resolution prediction and design of transmembrane helical protein structures , 2007, Proceedings of the National Academy of Sciences.

[17]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[18]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[19]  J. Broderick,et al.  Structural basis for glycyl radical formation by pyruvate formate-lyase activating enzyme , 2008, Proceedings of the National Academy of Sciences.

[20]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[21]  Chris Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Kimmen Sjölander,et al.  Berkeley PHOG: PhyloFacts orthology group prediction web server , 2009, Nucleic Acids Res..

[23]  Terence Hwa,et al.  High-resolution protein complexes from integrating genomic information with molecular simulation , 2009, Proceedings of the National Academy of Sciences.

[24]  Kangseok Lee,et al.  Crystal structure of the periplasmic component of a tripartite macrolide-specific efflux pump. , 2009, Journal of molecular biology.

[25]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[26]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[27]  David Kim,et al.  Structure prediction for CASP8 with all‐atom refinement using Rosetta , 2009, Proteins.

[28]  I. Tanaka,et al.  Two distinct regions in Staphylococcus aureus GatCAB guarantee accurate tRNA recognition , 2009, Nucleic acids research.

[29]  Florencio Pazos,et al.  Studying the co-evolution of protein families with the Mirrortree web server , 2010, Bioinform..

[30]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[31]  Soon-Jung Park,et al.  Crystal structure of toll‐like receptor 2‐activating lipoprotein IIpA from Vibrio vulnificus , 2011, Proteins.

[32]  Sivaraman Balakrishnan,et al.  Learning generative models for protein fold families , 2011, Proteins.

[33]  G. Thomas,et al.  Tripartite ATP-independent periplasmic (TRAP) transporters in bacteria and archaea. , 2011, FEMS microbiology reviews.

[34]  David Baker,et al.  Structural basis for gating charge movement in the voltage sensor of a sodium channel , 2011, Proceedings of the National Academy of Sciences.

[35]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[36]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[37]  Oliver F. Lange,et al.  Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples , 2012, Proceedings of the National Academy of Sciences.

[38]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[39]  N. Grishin,et al.  MESSA: MEta-Server for protein Sequence Analysis , 2012, BMC Biology.

[40]  Christopher Jarzynski,et al.  Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy , 2012, 1207.2484.

[41]  Bonnie Berger,et al.  A computational framework for boosting confidence in high-throughput protein-protein interaction datasets , 2012, Genome Biology.

[42]  Da-Neng Wang,et al.  Structure and mechanism of a bacterial sodium-dependent dicarboxylate transporter , 2012, Nature.

[43]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[44]  T. Yeates,et al.  Inward facing conformations of the MetNI methionine ABC transporter: Implications for the mechanism of transinhibition , 2012, Protein science : a publication of the Protein Society.

[45]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[46]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[47]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[48]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[49]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[50]  C. Su,et al.  Structure and mechanism of the tripartite CusCBA heavy-metal efflux complex , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[51]  Martin Weigt,et al.  Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis , 2012, Proceedings of the National Academy of Sciences.

[52]  The antibiotic thermorubin inhibits protein synthesis by binding to inter-subunit bridge B2a of the ribosome. , 2012, Journal of molecular biology.

[53]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[54]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[55]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  John M. Berrisford,et al.  Crystal structure of the entire respiratory complex I , 2013, Nature.

[57]  Terence Hwa,et al.  Coevolutionary signals across protein lineages help capture multiple protein conformations , 2013, Proceedings of the National Academy of Sciences.

[58]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[59]  Jindan Zhou,et al.  EcoGene 3.0 , 2012, Nucleic Acids Res..

[60]  Roland L. Dunbrack,et al.  S2C: A database correlating sequence and atomic coordinate residue numbering in the Protein Data Bank , 2013 .

[61]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[62]  D. Baker,et al.  Relaxation of backbone bond geometry improves protein energy landscape modeling , 2014, Protein science : a publication of the Protein Society.

[63]  F. Morcos,et al.  Integrated strategy reveals the protein interface between cancer targets Bcl-2 and NAF-1 , 2014, Proceedings of the National Academy of Sciences.