Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

BackgroundProtein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.ResultsWe propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.ConclusionsThe integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.AvailabilityDatasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

[1]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.

[2]  P. Bourne,et al.  Exploiting sequence and structure homologs to identify protein–protein binding sites , 2005, Proteins.

[3]  A J Olson,et al.  Morphology of protein-protein interfaces. , 1998, Structure.

[4]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[5]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[6]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[7]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[10]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[11]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..

[12]  Xiuquan Du,et al.  Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm , 2009, The protein journal.

[13]  Li Liu,et al.  Prediction of protein interactions by combining genetic algorithm with SVM method , 2007, 2007 IEEE Congress on Evolutionary Computation.

[14]  Susan Jones,et al.  SHARP2: protein-protein interaction predictions using patch analysis , 2006, Bioinform..

[15]  W. Kauzmann Some factors in the interpretation of protein denaturation. , 1959, Advances in protein chemistry.

[16]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[17]  Ruben Abagyan,et al.  Statistical analysis and prediction of protein–protein interfaces , 2005, Proteins.

[18]  J. Janin,et al.  Dissecting protein–protein recognition sites , 2002, Proteins.

[19]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[20]  Alexandre M J J Bonvin,et al.  How proteins get in touch: interface prediction in the study of biomolecular complexes. , 2008, Current protein & peptide science.

[21]  P. Chakrabarti,et al.  Conservation and relative importance of residues across protein-protein interfaces , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[23]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[24]  Olivier Lichtarge,et al.  BIOINFORMATICS ORIGINAL PAPER Systems biology , 2004 .

[25]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[26]  T. Blundell,et al.  Distinguishing structural and functional restraints in evolution in order to identify interaction sites. , 2004, Journal of molecular biology.

[27]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[28]  Tobias Müller,et al.  Modelling interaction sites in protein domains with interaction profile hidden Markov models , 2006, Bioinform..

[29]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[30]  J. Janin,et al.  Dissecting subunit interfaces in homodimeric proteins , 2003, Proteins.

[31]  J. Janin,et al.  A dissection of specific and non-specific protein-protein interfaces. , 2004, Journal of molecular biology.

[32]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[33]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[34]  Xiaolong Wang,et al.  Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins , 2007, BMC Bioinformatics.

[35]  Bonnie Berger,et al.  Struct2Net: Integrating Structure into Protein-Protein Interaction Prediction , 2005, Pacific Symposium on Biocomputing.

[36]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[37]  José María Carazo,et al.  Smoothly distributed fuzzy c-means: a new self-organizing map , 2001, Pattern Recognit..

[38]  Joël Janin,et al.  Specific versus non-specific contacts in protein crystals , 1997, Nature Structural Biology.

[39]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[40]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[41]  Simon Parsons,et al.  Bioinformatics: The Machine Learning Approach by P. Baldi and S. Brunak, 2nd edn, MIT Press, 452 pp., $60.00, ISBN 0-262-02506-X , 2004, The Knowledge Engineering Review.

[42]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[43]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[44]  Hau-San Wong,et al.  Prediction of protein B-factors using multi-class bounded SVM. , 2007, Protein and peptide letters.

[45]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[46]  Pamela A. Silver,et al.  Identification of an Evolutionarily Conserved Domain in Human Lens Epithelium-derived Growth Factor/Transcriptional Co-activator p75 (LEDGF/p75) That Binds HIV-1 Integrase* , 2004, Journal of Biological Chemistry.

[47]  Josef Kittler,et al.  Sum Versus Vote Fusion in Multiple Classifier Systems , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  A. Engelman,et al.  Structural basis for the recognition between HIV-1 integrase and transcriptional coactivator p75. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[50]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[51]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[52]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[53]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[54]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[55]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[56]  Hau-San Wong,et al.  3D head model retrieval in kernel feature space using HSOM , 2008, Pattern Recognit..

[57]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[58]  R. Kini,et al.  Prediction of potential protein‐protein interaction sites from amino acid sequence , 1996, FEBS letters.

[59]  Feng Dan,et al.  Specific and non-specific contacts in protein crystals. , 2004, Protein and peptide letters.

[60]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[61]  Sarah A. Teichmann,et al.  3D Complex: A Structural Classification of Protein Complexes , 2006, PLoS Comput. Biol..

[62]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[63]  T J Bollenbach,et al.  Kinetic linked-function analysis of the multiligand interactions on Mg(2+)-activated yeast pyruvate kinase. , 2001, Biochemistry.

[64]  A. Bulpitt,et al.  Insights into protein-protein interfaces using a Bayesian network prediction method. , 2006, Journal of molecular biology.