Prediction of Hot Spots Based on Physicochemical Features and Relative Accessible Surface Area of Amino Acid Sequence

Hot spot is dominant for understanding the mechanism of protein-protein interactions and can be applied as a target to drug design. Since experimental methods are costly and time-consuming, computational methods are prevalently applied as an useful tool in hot spot prediction through sequence or structure information. Here, we propose a new sequence-based model that combines physicochemical features with relative accessible surface area of amino acid sequence. The model consists of 83 classifiers involving IBk algorithm, where instances for one classifier are encoded by corresponding property extracted from 544 properties in AAindex1 database. Then several top performance classifiers with respect to F1 score are selected to be an ensemble by majority voting technique. The model outperforms other state-of-the-art computational methods, yields a F1 score of 0.80 on BID test set.

[1]  C. Chothia,et al.  Principles of protein–protein recognition , 1975, Nature.

[2]  J. Wells,et al.  Systematic mutational analyses of protein-protein interfaces. , 1991, Methods in enzymology.

[3]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[4]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[5]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[6]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[8]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[9]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[10]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[11]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[12]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[13]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[14]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[15]  Julie C. Mitchell,et al.  KFC Server: interactive forecasting of protein interaction hot spots , 2008, Nucleic Acids Res..

[16]  Dima Kozakov,et al.  Fragment-based identification of druggable 'hot spots' of proteins using Fourier domain correlation techniques , 2009, Bioinform..

[17]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[18]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[19]  Yong Wang,et al.  Rigorous assessment and integration of the sequence and structure based features to predict hot spots , 2011, BMC Bioinformatics.

[20]  Jinyan Li,et al.  Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences , 2013, Proteins.

[21]  Ramanathan Sowdhamini,et al.  ECMIS: computational approach for the identification of hotspots at protein-protein interfaces , 2014, BMC Bioinformatics.

[22]  J. Martins,et al.  Solvent‐accessible surface area: How well can be applied to hot‐spot detection? , 2014, Proteins.

[23]  Lin Wang,et al.  Prediction of hot spots in protein interfaces using extreme learning machines with the information of spatial neighbour residues. , 2014, IET systems biology.

[24]  B. Liu,et al.  DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation , 2015, Scientific Reports.

[25]  Hua Tang,et al.  Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. , 2016, Molecular bioSystems.