Analysis of hot regions prediction in PPI with different amino acid mutation using machine learning algorithm

Discovering hot regions in protein–protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort, meanwhile, different amino acid mutations will bring different energy changes, which lead to different features selection in predictive models; thus, the analysis of predictive models with different amino acid mutation using machine learning algorithm can be very helpful. In this paper, firstly 20 datasets are obtained according to all 20 kinds of amino acid using mutation data from the SKEMPI; then predictive models by combining feature-based classification and density-based incremental clustering were applied in datasets separately. Experiment results show that RctASA, Hydrophobicity, BpASA and BminCX are the best features to discriminate hot spots and non-hot spots at all the datasets according to different kinds of amino acid mutations, and the dataset with LEU, GLY and PRO mutations have the better prediction performance than the other amino acid mutations. These analyses help us to achieve a better insight on protein and their interactions.

[1]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Juan Fernández-Recio,et al.  Prediction of protein-binding areas by small-world residue networks and application to docking , 2011, BMC Bioinformatics.

[3]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[4]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[5]  Xiaoming Liu,et al.  Mass Classification in Mammograms Using Selected Geometry and Texture Features, and a New SVM-Based Feature Selection Method , 2014, IEEE Systems Journal.

[6]  Ozlem Keskin,et al.  HotPoint: hot spot prediction server for protein interfaces , 2010, Nucleic Acids Res..

[7]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[8]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[9]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[10]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[11]  Mina Maleki,et al.  Prediction of protein-protein interaction types using machine learning approaches , 2014 .

[12]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[13]  Ozlem Keskin,et al.  Analysis of Hot Region Organization in Hub Proteins , 2010, Annals of Biomedical Engineering.