Residue co-evolution helps predict interaction sites in α-helical membrane proteins.

Many integral membrane proteins, just like their globular counterparts, form either transient or permanent multi-subunit complexes to fulfill specific cellular roles. Although numerous interactions between these proteins have been experientially determined, the structural coverage of the complexes is very low. Therefore, the computational identification of the amino acid residues involved in the interaction interfaces is a crucial step towards the functional annotation of all membrane proteins.Here, we present MBPred, a sequence-based method for predicting the interface residues in transmembrane proteins. An unique feature of our method is that it contains separate random forest models for two different use cases: (a) when the location of transmembrane regions is precisely known from a crystal structure, and (b) when it is predicted from sequence. In stark contrast to the aqueous-exposed protein segments, we found that the interaction sites located in the membrane are not enriched for evolutionary conservation, most likely due to their restricted amino acid composition or their random distribution among buried and exposed residues. On the other hand, residue co-evolution proved to be a very informative feature which has not so far been used for predicting interaction sites in individual proteins. MBPred reaches AUC, precision and recall values of 0.79/0.73, 0.69/0.51 and 0.55/0.48 on the cross-validation and independent test dataset, respectively, thus outperforming the previously published method of Bordner as well as all methods trained on globular proteins. Moreover, we show that for the majority of complete interface patches, the method captures more than 50% of the involved residues.

[1]  Zsuzsanna Dosztányi,et al.  TMDET: web server for detecting transmembrane regions of proteins by using their 3D coordinates , 2005, Bioinform..

[2]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[3]  Haiyuan Yu,et al.  Interactome INSIDER: a structural interactome browser for genomic studies , 2017, Nature Methods.

[4]  Jie Liang,et al.  Prediction of transmembrane helix orientation in polytopic membrane proteins , 2006, BMC Structural Biology.

[5]  H. Mewes,et al.  Protein structural classes in five complete genomes , 1997, Nature Structural Biology.

[6]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[7]  I. Campbell,et al.  Transmembrane and cytoplasmic domains in integrin activation and protein-protein interactions (Review) , 2008, Molecular membrane biology.

[8]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[9]  Olivier Lichtarge,et al.  BIOINFORMATICS ORIGINAL PAPER Systems biology , 2004 .

[10]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[11]  Jie Liang,et al.  Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins , 2005, Proteins.

[12]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[13]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[14]  R. Abagyan,et al.  Identification of protein-protein interaction sites from docking energy landscapes. , 2004, Journal of molecular biology.

[15]  T. Richmond,et al.  Solvent accessible surface area and excluded volume in proteins. Analytical equations for overlapping spheres and implications for the hydrophobic effect. , 1984, Journal of molecular biology.

[16]  Andrew J. Bordner,et al.  Predicting protein-protein binding sites in membrane proteins , 2009, BMC Bioinformatics.

[17]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[18]  Dmitrij Frishman,et al.  Accurate prediction of helix interactions and residue contacts in membrane proteins. , 2016, Journal of structural biology.

[19]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[20]  Pamela F. Jones,et al.  Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams , 2011, BMC Bioinformatics.

[21]  Chen Keasar,et al.  Knowledge-based potential for positioning membrane-associated structures and assessing residue-specific energetic contributions. , 2012, Structure.

[22]  Yu-Dong Cai,et al.  Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS , 2012, PloS one.

[23]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[24]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[25]  Ilan Samish,et al.  The membrane- and soluble-protein helix-helix interactome: similar geometry via different interactions. , 2015, Structure.

[26]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[27]  Ruben Abagyan,et al.  Statistical analysis and prediction of protein–protein interfaces , 2005, Proteins.

[28]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[29]  Y. Shai,et al.  Transmembrane domains interactions within the membrane milieu: principles, advances and challenges. , 2012, Biochimica et biophysica acta.

[30]  Alexandre M J J Bonvin,et al.  SpotOn: High Accuracy Identification of Protein-Protein Interface Hot-Spots , 2017, Scientific Reports.

[31]  Wen-Lian Hsu,et al.  Protein-Protein Interaction Site Predictions with Three-Dimensional Probability Distributions of Interacting Atoms on Protein Surfaces , 2012, PloS one.

[32]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[33]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[34]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[35]  James U Bowie,et al.  Structural imperatives impose diverse evolutionary constraints on helical membrane proteins , 2009, Proceedings of the National Academy of Sciences.

[36]  Michal Brylinski,et al.  Predicting protein interface residues using easily accessible on-line resources , 2015, Briefings Bioinform..

[37]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[38]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[39]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.

[40]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[41]  Mainak Guharoy,et al.  Conserved residue clusters at protein-protein interfaces and their use in binding site identification , 2010, BMC Bioinformatics.

[42]  Dániel Kozma,et al.  PDBTM: Protein Data Bank of transmembrane proteins after 8 years , 2012, Nucleic Acids Res..

[43]  Burkhard Rost,et al.  Alternative Protein-Protein Interfaces Are Frequent Exceptions , 2012, PLoS Comput. Biol..

[44]  Thomas C. Northey,et al.  IntPred: a structure-based predictor of protein–protein interaction sites , 2017, Bioinform..

[45]  A. Meents,et al.  Native-like photosystem II superstructure at 2.44 Å resolution through detergent extraction from the protein crystal. , 2014, Structure.

[46]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[47]  Rainer Merkl,et al.  Prescont: Predicting protein‐protein interfaces utilizing four residue properties , 2012, Proteins.

[48]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[49]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[50]  P. Barth,et al.  Evolutionary-guided de novo structure prediction of self-associated transmembrane helical proteins with near-atomic accuracy , 2015, Nature Communications.

[51]  Andrew C. R. Martin,et al.  BiopLib and BiopTools—a C programming library and toolset for manipulating protein structure , 2015, Bioinform..

[52]  D. Frishman,et al.  Prediction of helix–helix contacts and interacting helices in polytopic membrane proteins using neural networks , 2009, Proteins.

[53]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[54]  H. Yin,et al.  Drugging Membrane Protein Interactions. , 2016, Annual review of biomedical engineering.

[55]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[56]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[57]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[58]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.