Prediction of hot spots in protein interfaces using extreme learning machines with the information of spatial neighbour residues.

The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming increasingly important for the research on protein-protein interaction and drug design. For each interface residue or target residue to be predicted, the authors extract hybrid features which incorporate a wide range of information of the target residue and its spatial neighbor residues, that is, the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). Here, feature selection is performed using random forests to avoid over-fitting. Thereafter, the extreme learning machine is employed to effectively integrate these hybrid features for predicting hot spots in protein interfaces. By the 5-fold cross validation in the training set, their method can achieve accuracy (ACC) of 82.1% and Matthew's correlation coefficient (MCC) of 0.459, and outperforms some alternative machine learning methods in the comparison study. Furthermore, their method achieves ACC of 76.8% and MCC of 0.401 in the independent test set, and is more effective than the major existing hot spot predictors. Their prediction method offers a powerful tool for uncovering candidate residues in the studies of alanine scanning mutagenesis for functional protein interaction sites.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  F A Quiocho,et al.  Target enzyme recognition by calmodulin: 2.4 A structure of a calmodulin-peptide complex. , 1992, Science.

[3]  Jaime Prilusky,et al.  Automated analysis of interatomic contacts in proteins , 1999, Bioinform..

[4]  M. Dwyer,et al.  Peptide exosite inhibitors of factor VIIa as anticoagulants , 2000, Nature.

[5]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[6]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[7]  Peter A. Kollman,et al.  Computational alanine scanning of the 1:1 human growth hormone–receptor complex , 2002, J. Comput. Chem..

[8]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[10]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[11]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[12]  Jie Liang,et al.  Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. , 2004, Journal of molecular biology.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[15]  A. del Sol,et al.  Small‐world network approach to identify key residues in protein–protein interaction , 2004, Proteins.

[16]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[17]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[18]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[19]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[20]  H. Wolfson,et al.  Spatial chemical conservation of hot spot interactions in protein-protein complexes , 2007, BMC Biology.

[21]  Christopher L. McClendon,et al.  Reaching for high-hanging fruit in drug discovery at protein–protein interfaces , 2007, Nature.

[22]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[23]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[24]  Xiang-Sun Zhang,et al.  Bridging protein local structures and protein functions , 2008, Amino Acids.

[25]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[26]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[27]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[28]  Jinyan Li,et al.  ‘Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfaces , 2009, Bioinform..

[29]  Salam A. Assi,et al.  PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces , 2009, Nucleic acids research.

[30]  Ozlem Keskin,et al.  Analysis and network representation of hotspots in protein interfaces using minimum cut trees , 2010, Proteins.

[31]  Julie C. Mitchell,et al.  KFC2: A knowledge‐based hot spot prediction method based on interface solvation, atomic density, and plasticity features , 2011, Proteins.

[32]  Zhenhua Li,et al.  Structural analysis of the hot spots in the binding between H1N1 HA and the 2D1 antibody: do mutations of H1N1 from 1918 to 2009 affect much on this binding? , 2011, Bioinform..

[33]  Bin Xu,et al.  A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces , 2012, BMC Systems Biology.

[34]  Xiang-Sun Zhang,et al.  Prediction of hot spots in protein interfaces using a random forest model with hybrid features. , 2012, Protein engineering, design & selection : PEDS.