Prediction of protein–protein interaction sites from weakly homologous template structures using meta‐threading and machine learning

The identification of protein–protein interactions is vital for understanding protein function, elucidating interaction mechanisms, and for practical applications in drug discovery. With the exponentially growing protein sequence data, fully automated computational methods that predict interactions between proteins are becoming essential components of system‐level function inference. A thorough analysis of protein complex structures demonstrated that binding site locations as well as the interfacial geometry are highly conserved across evolutionarily related proteins. Because the conformational space of protein–protein interactions is highly covered by experimental structures, sensitive protein threading techniques can be used to identify suitable templates for the accurate prediction of interfacial residues. Toward this goal, we developed eFindSitePPI, an algorithm that uses the three‐dimensional structure of a target protein, evolutionarily remotely related templates and machine learning techniques to predict binding residues. Using crystal structures, the average sensitivity (specificity) of eFindSitePPI in interfacial residue prediction is 0.46 (0.92). For weakly homologous protein models, these values only slightly decrease to 0.40–0.43 (0.91–0.92) demonstrating that eFindSitePPI performs well not only using experimental data but also tolerates structural imperfections in computer‐generated structures. In addition, eFindSitePPI detects specific molecular interactions at the interface; for instance, it correctly predicts approximately one half of hydrogen bonds and aromatic interactions, as well as one third of salt bridges and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSitePPI outperforms other methods for protein‐binding residue prediction. It also features a carefully tuned confidence estimation system, which is particularly useful in large‐scale applications using raw genomic data. eFindSitePPI is freely available to the academic community at http://www.brylinski.org/efindsiteppi. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Juliette Martin,et al.  Benchmarking protein–protein interface predictions: Why you should care about protein size , 2014, Proteins.

[2]  Michal Brylinski,et al.  eFindSite: Improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands , 2013, Journal of Computer-Aided Molecular Design.

[3]  Michal Brylinski,et al.  Unleashing the power of meta-threading for evolution/structure-based function inference of proteins , 2013, Front. Genet..

[4]  Chenghua Shao,et al.  Trendspotting in the Protein Data Bank , 2013, FEBS letters.

[5]  Peng Chen,et al.  Current Status of Machine Learning-Based Methods for Identifying Protein-Protein Interaction Sites , 2013 .

[6]  Michal Brylinski,et al.  Setting up a Meta-Threading Pipeline for High-Throughput Structural Bioinformatics: eThread Software Distribution, Walkthrough and Resource Profiling , 2013 .

[7]  Edward S. C. Shih,et al.  A Critical Assessment of Information-guided Protein–Protein Docking Predictions* , 2012, Molecular & Cellular Proteomics.

[8]  M. Brylinski,et al.  eThread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures , 2012, PloS one.

[9]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[10]  Harry Jubb,et al.  Structural biology and drug discovery for protein-protein interactions. , 2012, Trends in pharmacological sciences.

[11]  Alex W. Wilkinson,et al.  Computational prediction of protein-protein interactions , 2012 .

[12]  Vasant Honavar,et al.  Predicting protein-protein interface residues using local surface structural similarity , 2012, BMC Bioinformatics.

[13]  Bin Li,et al.  Protein docking prediction using predicted protein-protein interface , 2012, BMC Bioinformatics.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[16]  Jeffrey Skolnick,et al.  iAlign: a method for the structural comparison of protein-protein interfaces , 2010, Bioinform..

[17]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[18]  Raquel Norel,et al.  Protein interface conservation across structure space , 2010, Proceedings of the National Academy of Sciences.

[19]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[20]  Alessandra Carbone,et al.  Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling , 2009, PLoS Comput. Biol..

[21]  Fan Jiang,et al.  Prediction of protein-protein binding site by using core interface residue and support vector machine , 2008, BMC Bioinformatics.

[22]  Jeffrey Skolnick,et al.  Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score , 2008, BMC Bioinformatics.

[23]  Z. Weng,et al.  Protein–protein docking benchmark version 3.0 , 2008, Proteins.

[24]  Ashkan Golshani,et al.  Computational methods for predicting protein-protein interactions. , 2008, Advances in biochemical engineering/biotechnology.

[25]  Christopher L. McClendon,et al.  Reaching for high-hanging fruit in drug discovery at protein–protein interfaces , 2007, Nature.

[26]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[27]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[28]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[29]  Juan Fernández-Recio,et al.  Efficient restraints for protein-protein docking by comparison of observed amino acid substitution patterns with those predicted from local environment. , 2006, Journal of molecular biology.

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[32]  P. Chakrabarti,et al.  Conservation and relative importance of residues across protein-protein interfaces , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Z. Weng,et al.  Structure, function, and evolution of transient and obligate protein-protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[35]  Nathan A. Baker,et al.  PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations , 2004, Nucleic Acids Res..

[36]  David E. Kim,et al.  Computational Alanine Scanning of Protein-Protein Interfaces , 2004, Science's STKE.

[37]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[38]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[39]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[40]  Simon J Davis,et al.  Molecular interactions mediating T cell antigen recognition. , 2003, Annual review of immunology.

[41]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[42]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[43]  Robert B. Russell,et al.  InterPreTS: protein Interaction Prediction through Tertiary Structure , 2003, Bioinform..

[44]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[45]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[46]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[47]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[48]  Olivier Lichtarge,et al.  Prediction and confirmation of a site critical for effector regulation of RGS domain activity , 2001, Nature Structural Biology.

[49]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[50]  O. Lichtarge,et al.  A regulator of G protein signaling interaction surface linked to effector specificity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[51]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[52]  B K Jakobsen,et al.  T cell receptor and coreceptor CD8 alphaalpha bind peptide-MHC independently and with distinct kinetics. , 1999, Immunity.

[53]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[54]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[55]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[56]  R. Nussinov,et al.  Hydrogen bonds and salt bridges across protein-protein interfaces. , 1997, Protein engineering.

[57]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[58]  P Argos,et al.  Hydrophobic patches on protein subunit interfaces: Characteristics and prediction , 1997, Proteins.

[59]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[60]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[61]  P. Brick,et al.  Crystal structure of a NAD-dependent D-glycerate dehydrogenase at 2.4 A resolution. , 1994, Journal of molecular biology.

[62]  H. Yamada,et al.  Purification and characterization of serine-glyoxylate aminotransferase from a serine-producing methylotroph, Hyphomicrobium methylovorum GM2. , 1990, European journal of biochemistry.

[63]  S. Tonegawa,et al.  A third rearranged and expressed gene in a clone of cytotoxic T lymphocytes , 1984, Nature.

[64]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.