Predicting binding sites of hydrolase-inhibitor complexes by combining several methods

BackgroundProtein-protein interactions play a critical role in protein function. Completion of many genomes is being followed rapidly by major efforts to identify interacting protein pairs experimentally in order to decipher the networks of interacting, coordinated-in-action proteins. Identification of protein-protein interaction sites and detection of specific amino acids that contribute to the specificity and the strength of protein interactions is an important problem with broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks.ResultsIn order to increase the power of predictive methods for protein-protein interaction sites, we have developed a consensus methodology for combining four different methods. These approaches include: data mining using Support Vector Machines, threading through protein structures, prediction of conserved residues on the protein surface by analysis of phylogenetic trees, and the Conservatism of Conservatism method of Mirny and Shakhnovich. Results obtained on a dataset of hydrolase-inhibitor complexes demonstrate that the combination of all four methods yield improved predictions over the individual methods.ConclusionsWe developed a consensus method for predicting protein-protein interface residues by combining sequence and structure-based methods. The success of our consensus approach suggests that similar methodologies can be developed to improve prediction accuracies for other bioinformatic problems.

[1]  O. Ptitsyn,et al.  Non-functional conserved residues in globins and their possible role as a folding nucleus. , 1999, Journal of molecular biology.

[2]  X. Gu,et al.  Statistical methods for testing functional divergence after gene duplication. , 1999, Molecular biology and evolution.

[3]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[4]  R L Jernigan,et al.  Identifying sequence-structure pairs undetected by sequence alignments. , 2000, Protein engineering.

[5]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[6]  Oliviero Carugo,et al.  Prediction of protein‐protein interactions based on surface patch comparison , 2004, Proteomics.

[7]  Chris Sander,et al.  The HSSP database of protein structure-sequence alignments and family profiles , 1998, Nucleic Acids Res..

[8]  Paolo Ascenzi,et al.  Crystal and molecular structure of the bovine α-chymotrypsin-eglin c complex at 2.0 Å resolution☆ , 1992 .

[9]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[12]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[13]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[14]  C. Chothia,et al.  Principles of protein–protein recognition , 1975, Nature.

[15]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[16]  Vasant Honavar,et al.  Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach , 2004, Neural Computing & Applications.

[17]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[18]  N M Luscombe,et al.  What is Bioinformatics? A Proposed Definition and Overview of the Field , 2001, Methods of Information in Medicine.

[19]  A. Godzik,et al.  Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[21]  Hui Lu,et al.  Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. , 2003, Genome research.

[22]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Y. Katsube,et al.  Crystal structure of an elastase-specific inhibitor elafin complexed with porcine pancreatic elastase determined at 1.9 A resolution. , 1996, Biochemistry.

[25]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[26]  Y. Satow,et al.  Refined crystal structure of the complex of subtilisin BPN' and Streptomyces subtilisin inhibitor at 1.8 A resolution. , 1991, Journal of molecular biology.

[27]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[28]  J Meller,et al.  Linear programming optimization and a double statistical filter for protein threading protocols , 2001, Proteins.

[29]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[30]  R. Kini,et al.  Prediction of potential protein‐protein interaction sites from amino acid sequence , 1996, FEBS letters.

[31]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[32]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[33]  M. Grütter,et al.  A new structural class of serine protease inhibitors revealed by the structure of the hirustasin-kallikrein complex. , 1997, Structure.

[34]  Alfonso Valencia,et al.  Prediction of protein-protein interactions from evolutionary information. , 2003, Methods of biochemical analysis.

[35]  Thomas G. Dietterich,et al.  Bioinformatics The Machine Learning Approach 2nd ed. , 2001 .

[36]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[37]  A Coda,et al.  Crystal and molecular structure of the bovine alpha-chymotrypsin-eglin c complex at 2.0 A resolution. , 1992, Journal of molecular biology.

[38]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[39]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[40]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[41]  Michael B Yaffe,et al.  Computational prediction of protein-protein interactions. , 2004, Methods in molecular biology.

[42]  Marti A. Hearst Intelligent Connections: Battling with GA-Joe. , 1998 .

[43]  J. Janin,et al.  Dissecting protein–protein recognition sites , 2002, Proteins.

[44]  D. Covell,et al.  A role for surface hydrophobicity in protein‐protein recognition , 1994, Protein science : a publication of the Protein Society.

[45]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[46]  Simon Parsons,et al.  Bioinformatics: The Machine Learning Approach by P. Baldi and S. Brunak, 2nd edn, MIT Press, 452 pp., $60.00, ISBN 0-262-02506-X , 2004, The Knowledge Engineering Review.

[47]  R J Read,et al.  Structure of the complex of Streptomyces griseus protease B and the third domain of the turkey ovomucoid inhibitor at 1.8-A resolution. , 1983, Biochemistry.

[48]  Robert L. Jernigan,et al.  Identifying a Folding Nucleus for the Lysozyme/α-Lactalbumin Family from Sequence Conservation Clusters , 2002, Journal of Molecular Evolution.

[49]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[50]  N. Wingreen,et al.  NATURE OF DRIVING FORCE FOR PROTEIN FOLDING : A RESULT FROM ANALYZING THE STATISTICAL POTENTIAL , 1995, cond-mat/9512111.

[51]  M Levitt,et al.  Different protein sequences can give rise to highly similar folds through different stabilizing interactions , 1994, Protein science : a publication of the Protein Society.

[52]  Hui Lu,et al.  MULTIPROSPECTOR: An algorithm for the prediction of protein–protein interactions by multimeric threading , 2002, Proteins.

[53]  S. Suh,et al.  Kunitz-type soybean trypsin inhibitor revisited: refined structure of its complex with porcine trypsin reveals an insight into the interaction between a homologous inhibitor from Erythrina caffra and tissue-type plasminogen activator. , 1998, Journal of molecular biology.

[54]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[55]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[56]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[57]  Xun Gu,et al.  DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family , 2002, Bioinform..

[58]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[59]  Drena Dobbs,et al.  Three-dimensional threading approach to protein structure recognition , 2004 .

[60]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[61]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[62]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[63]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Hui Lu,et al.  Development of unified statistical potentials describing protein-protein interactions. , 2003, Biophysical journal.

[65]  J M Thornton,et al.  Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing , 1995, Proteins.

[66]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[67]  D C Rees,et al.  Refined crystal structure of the potato inhibitor complex of carboxypeptidase A at 2.5 A resolution. , 1982, Journal of molecular biology.

[68]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..