Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.

[1]  Xiang-Sun Zhang,et al.  Prediction of hot spots in protein interfaces using a random forest model with hybrid features. , 2012, Protein engineering, design & selection : PEDS.

[2]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[3]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[4]  Jinyan Li,et al.  Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences , 2013, Proteins.

[5]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[6]  Avner Schlessinger,et al.  PredictProtein—an open resource for online prediction of protein structural and functional features , 2014, Nucleic Acids Res..

[7]  Solène Grosdidier,et al.  Identification of hot-spot residues in protein-protein interactions by computational docking , 2008, BMC Bioinformatics.

[8]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[9]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[11]  A. del Sol,et al.  Small‐world network approach to identify key residues in protein–protein interaction , 2004, Proteins.

[12]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[13]  Holger Gohlke,et al.  Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. , 2006, Current medicinal chemistry.

[14]  S. Vajda,et al.  Anchor residues in protein-protein interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[16]  Jinyan Li,et al.  A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[19]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[20]  Jinyan Li,et al.  Detection of Outlier Residues for Improving Interface Prediction in Protein Hetero-complexes , 2022 .

[21]  Jinyan Li,et al.  Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information , 2010, BMC Bioinformatics.

[22]  Ozlem Keskin,et al.  HotSprint: database of computational hot spots in protein interfaces , 2007, Nucleic Acids Res..

[23]  N. Kannan,et al.  Analysis of homodimeric protein interfaces by graph-spectral methods. , 2002, Protein engineering.

[24]  Ozlem Keskin,et al.  HotPoint: hot spot prediction server for protein interfaces , 2010, Nucleic Acids Res..

[25]  M. Michael Gromiha,et al.  PINT: Protein–protein Interactions Thermodynamic Database , 2005, Nucleic Acids Res..

[26]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[27]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[28]  Massimiliano Pontil,et al.  Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods , 2009, BMC Bioinformatics.

[29]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[30]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[31]  Xin Gao,et al.  LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone , 2014, BMC Bioinformatics.

[32]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[33]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[34]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[35]  Peter A. Kollman,et al.  Computational alanine scanning of the 1:1 human growth hormone–receptor complex , 2002, J. Comput. Chem..

[36]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[37]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[38]  Luhua Lai,et al.  Structure-based method for analyzing protein–protein interfaces , 2004, Journal of molecular modeling.

[39]  P. Chakrabarti,et al.  Conservation and relative importance of residues across protein-protein interfaces , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[41]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..