Prediction of protein-protein interaction sites using an ensemble method

BackgroundPrediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved.ResultsIn this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites.ConclusionOur experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.

[1]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[2]  J. Janin,et al.  Dissecting protein–protein recognition sites , 2002, Proteins.

[3]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[6]  Vasant Honavar,et al.  Characterization of Protein–Protein Interfaces , 2008, The protein journal.

[7]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[8]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[9]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Fan Jiang,et al.  Prediction of protein-protein binding site by using core interface residue and support vector machine , 2008, BMC Bioinformatics.

[11]  P. Argos An investigation of protein subunit and domain interfaces. , 1988, Protein engineering.

[12]  Robert H. Silverman,et al.  Functional Replacement of the Carboxy-Terminal Two-Thirds of the Influenza A Virus NS1 Protein with Short Heterologous Dimerization Domains , 2002, Journal of Virology.

[13]  Yoshihiro Kawaoka,et al.  Influenza: lessons from past pandemics, warnings from current incidents , 2005, Nature Reviews Microbiology.

[14]  Rei-Lin Kuo,et al.  The CPSF30 Binding Site on the NS1A Protein of Influenza A Virus Is a Potential Antiviral Target , 2006, Journal of Virology.

[15]  Z. Wen,et al.  Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition , 2007, Amino Acids.

[16]  G. Neumann,et al.  Influenza A virus NS2 protein mediates vRNP nuclear export through NES‐independent interaction with hCRM1 , 2000, The EMBO journal.

[17]  Piero Fariselli,et al.  A neural network method to improve prediction of protein-protein interaction sites in heterocomplexes , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[18]  P. Bourne,et al.  Exploiting sequence and structure homologs to identify protein–protein binding sites , 2005, Proteins.

[19]  R. Krug,et al.  A site on the influenza A virus NS1 protein mediates both inhibition of PKR activation and temporal regulation of viral RNA synthesis. , 2007, Virology.

[20]  Vasant Honavar,et al.  Identification of Surface Residues Involved in Protein-Protein Interaction — A Support Vector Machine Approach , 2003 .

[21]  Sam Ansari,et al.  Statistical analysis of predominantly transient protein–protein interfaces , 2005, Proteins.

[22]  Ana Tereza Ribeiro de Vasconcelos,et al.  Structural descriptor database: a new tool for sequence-based functional site prediction , 2008, BMC Bioinformatics.

[23]  Jagath C. Rajapakse,et al.  Protein-Protein Interface Residue Prediction with SVM Using Evolutionary Profiles and Accessible Surface Areas , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[24]  C. Müller,et al.  Crystal structure of the M1 protein‐binding domain of the influenza A virus nuclear export protein (NEP/NS2) , 2003, The EMBO journal.

[25]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[26]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[27]  A. Kukol,et al.  Large-scale analysis of influenza A virus sequences reveals potential drug target sites of non-structural proteins. , 2009, The Journal of general virology.

[28]  R. Nussinov,et al.  Residue centrality, functionally important residues, and active site shape: Analysis of enzyme and non‐enzyme families , 2006, Protein science : a publication of the Protein Society.

[29]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[30]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[31]  C. Chothia,et al.  The structure of protein-protein recognition sites. , 1990, The Journal of biological chemistry.

[32]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[33]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[34]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[35]  A. Bulpitt,et al.  Insights into protein-protein interfaces using a Bayesian network prediction method. , 2006, Journal of molecular biology.

[36]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[37]  R. Lamb,et al.  Influenza virus assembly and budding at the viral budozone. , 2005, Advances in virus research.

[38]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[39]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[40]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[41]  Xiaolong Wang,et al.  Protein-protein interaction site prediction based on conditional random fields , 2007, Bioinform..

[42]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[43]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[44]  Chris Sander,et al.  The HSSP database of protein structure-sequence alignments , 1993, Nucleic Acids Res..

[45]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[46]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[47]  David R. Westhead,et al.  Improved prediction of protein-Cprotein binding sites using a support vector machines approach , 2005, Bioinform..

[48]  C Chothia,et al.  Surface, subunit interfaces and interior of oligomeric proteins. , 1988, Journal of molecular biology.

[49]  J M Thornton,et al.  Protein-protein interactions: a review of protein dimer structures. , 1995, Progress in biophysics and molecular biology.

[50]  C. Chothia,et al.  Principles of protein–protein recognition , 1975, Nature.

[51]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[52]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[53]  Xin Li,et al.  Protein classification with imbalanced data , 2007, Proteins.

[54]  B. G. Hale,et al.  The multifunctional NS1 protein of influenza A viruses. , 2008, The Journal of general virology.

[55]  P. Chakrabarti,et al.  Conservation and relative importance of residues across protein-protein interfaces , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Helen M. Berman,et al.  Crystal structure of the unique RNA-binding domain of the influenza virus NS1 protein , 1997, Nature Structural Biology.

[57]  Richard E. Randall,et al.  Influenza A virus NS1 protein binds p85β and activates phosphatidylinositol-3-kinase signaling , 2006, Proceedings of the National Academy of Sciences.

[58]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[59]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[60]  K Cameron,et al.  Avian-to-human transmission of H9N2 subtype influenza A viruses: relationship between H9N2 and H5N1 human isolates. , 2000, Proceedings of the National Academy of Sciences of the United States of America.