Grading amino acid properties increased accuracies of single point mutation on protein stability prediction

BackgroundProtein stabilities can be affected sometimes by point mutations introduced to the protein. Current sequence-information-based protein stability prediction encoding schemes of machine learning approaches include sparse encoding and amino acid property encoding. Property encoding schemes employ physical-chemical information of the mutated protein environments, however, they produce complexity in the mean time when many properties joined in the scheme. The complexity introduces noises that affect machine learning algorithm accuracies. In order to overcome the problem we described a new encoding scheme that graded twenty amino acids into groups according to their specific property values.ResultsWe employed three predefined values, 0.1, 0.5, and 0.9 to represent 'weak', 'middle', and 'strong' groups for each amino acid property, and introduced two thresholds for each property to split twenty amino acids into one of the three groups according to their property values. Each amino acid can take only one out of three predefined values rather than twenty different values for each property. The complexity and noises in the encoding schemes were reduced in this way. More than 7% average accuracy improvement was found in the graded amino acid property encoding schemes by 20-fold cross validation. The overall accuracy of our method is more than 72% when performed on the independent test sets starting from sequence information with three-state prediction definitions.ConclusionsGrading numeric values of amino acid property can reduce the noises and complexity of input information. It is in accordance with biochemical concepts for amino acid properties and makes the input data simplified in the mean time. The idea of graded property encoding schemes may be applied to protein related predictions with machine learning approaches.

[1]  Gang Chen,et al.  Robust prediction of mutation-induced protein stability change by property encoding of amino acids. , 2008, Protein engineering, design & selection : PEDS.

[2]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[3]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[4]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[5]  Liang-Tsung Huang,et al.  iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations , 2007, Bioinform..

[6]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..

[7]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[8]  Nir Ben-Tal,et al.  Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site , 2011, Bioinform..

[9]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[10]  Liang-Tsung Huang,et al.  Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model , 2007, Journal of molecular modeling.

[11]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[12]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[13]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[14]  Shiow-Fen Hwang,et al.  Knowledge acquisition and development of accurate rules for predicting protein stability changes , 2006, Comput. Biol. Chem..

[15]  Wen Liu,et al.  Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models , 2006, BMC Bioinformatics.

[16]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[17]  Liang-Tsung Huang,et al.  Reliable prediction of protein thermostability change upon double mutation from amino acid sequence , 2009, Bioinform..

[18]  P. Kollman,et al.  Exhaustive mutagenesis in silico: Multicoordinate free energy calculations on proteins and peptides , 2000, Proteins.

[19]  K. Takano,et al.  Are the parameters of various stabilization factors estimated from mutant human lysozymes compatible with other proteins? , 2001, Protein engineering.

[20]  M. Michael Gromiha,et al.  CUPSAT: prediction of protein stability upon point mutations , 2006, Nucleic Acids Res..

[21]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[22]  Piero Fariselli,et al.  Predicting protein stability changes from sequences using support vector machines , 2005, ECCB/JBI.

[23]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Piero Fariselli,et al.  A three-state prediction of single point mutations on protein stability changes , 2007, BMC Bioinformatics.

[26]  Lien Fu Lai,et al.  Human-Readable Rule Generator for Integrating Amino Acid Sequence Information and Stability of Mutant Proteins , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[28]  Liang-Tsung Huang,et al.  First insight into the prediction of protein folding rate change upon point mutation , 2010, Bioinform..

[29]  Dietmar Schomburg,et al.  Structural analysis and prediction of protein mutant stability using distance and torsion potentials: Role of secondary structure and solvent accessibility , 2006, Proteins.

[30]  G. Schreiber,et al.  Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. , 2009, Protein engineering, design & selection : PEDS.