First insight into the prediction of protein folding rate change upon point mutation

SUMMARY The accurate prediction of protein folding rate change upon mutation is an important and challenging problem in protein folding kinetics and design. In this work, we have collected experimental data on protein folding rate change upon mutation from various sources and constructed a reliable and non-redundant dataset with 467 mutants. These mutants are widely distributed based on secondary structure, solvent accessibility, conservation score and long-range contacts. From systematic analysis of these parameters along with a set of 49 amino acid properties, we have selected a set of 12 features for discriminating the mutants that speed up or slow down the folding process. We have developed a method based on quadratic regression models for discriminating the accelerating and decelerating mutants, which showed an accuracy of 74% using the 10-fold cross-validation test. The sensitivity and specificity are 63% and 76%, respectively. The method can be improved with the inclusion of physical interactions and structure-based parameters. AVAILABILITY http://bioinformatics.myweb.hinet.net/freedom.htm.

[1]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[2]  M. Michael Gromiha,et al.  CUPSAT: prediction of protein stability upon point mutations , 2006, Nucleic Acids Res..

[3]  Natalya S. Bogatyreva,et al.  KineticDB: a database of protein folding kinetics , 2008, Nucleic Acids Res..

[4]  Hongyi Zhou,et al.  Folding rate prediction using total contact distance. , 2002, Biophysical journal.

[5]  M. Michael Gromiha,et al.  Thermodynamic Database for Protein-Nucleic Acid Interactions , 1999 .

[6]  Piero Fariselli,et al.  Predicting protein stability changes from sequences using support vector machines , 2005, ECCB/JBI.

[7]  M. Michael Gromiha,et al.  TMFunction: database for functional residues in membrane proteins , 2008, Nucleic Acids Res..

[8]  P. Suganthan,et al.  Identification of catalytic residues from protein structure using support vector machine with sequence and structural features. , 2008, Biochemical and biophysical research communications.

[9]  Thierry Soussi,et al.  Investigation and prediction of the severity of p53 mutants using parameters from structural calculations , 2009, The FEBS journal.

[10]  Ashley M. Buckle,et al.  PFD: a database for the investigation of protein folding kinetics and stability , 2004, Nucleic Acids Res..

[11]  M. Michael Gromiha,et al.  Influence of Medium and Long Range Interactions in Different Structural Classes of Globular Proteins , 1997, Journal of biological physics.

[12]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[13]  Hui Chen,et al.  Secondary structure length as a determinant of folding rate of proteins with two‐ and three‐state kinetics , 2007, Proteins.

[14]  M. Michael Gromiha,et al.  Importance of Native-State Topology for Determining the Folding Rate of Two-State Proteins , 2003, J. Chem. Inf. Comput. Sci..

[15]  Alfonso Valencia,et al.  FireDB—a database of functionally important residues from proteins of known structure , 2006, Nucleic Acids Res..

[16]  M. Michael Gromiha,et al.  Multiple Contact Network Is a Key Determinant to Protein Folding Rates , 2009, J. Chem. Inf. Model..

[17]  R. Abagyan,et al.  Large‐scale prediction of protein geometry and stability changes for arbitrary single point mutations , 2004, Proteins.

[18]  M. Gromiha,et al.  Prediction of protein stability upon point mutations. , 2007, Biochemical Society transactions.

[19]  M. Michael Gromiha and S. Selvaraj,et al.  Bioinformatics Approaches for Understanding and Predicting Protein Folding Rates , 2008 .

[20]  M. Michael Gromiha,et al.  FOLD-RATE: prediction of protein folding rates from amino acid sequence , 2006, Nucleic Acids Res..

[21]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[22]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[23]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[24]  M. Gromiha,et al.  Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction. , 2001, Journal of molecular biology.

[25]  Liang-Tsung Huang,et al.  Reliable prediction of protein thermostability change upon double mutation from amino acid sequence , 2009, Bioinform..

[26]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[27]  Tao Zhang,et al.  Prediction of function changes associated with single‐point protein mutations using support vector machines (SVMs) , 2009, Human mutation.

[28]  Shuangye Yin,et al.  Eris: an automated estimator of protein stability , 2007, Nature Methods.

[29]  M. Michael Gromiha,et al.  PINT: Protein–protein Interactions Thermodynamic Database , 2005, Nucleic Acids Res..

[30]  András Kocsor,et al.  A Protein Classification Benchmark collection for machine learning , 2007, Nucleic Acids Res..

[31]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[32]  Dennis R. Livesay,et al.  How accurate and statistically robust are catalytic site predictions based on closeness centrality? , 2007, BMC Bioinformatics.

[33]  M. Gromiha,et al.  Importance of Surrounding Residues for Protein Stability of Partially Buried Mutations , 2000, Journal of biomolecular structure & dynamics.

[34]  Marco Punta,et al.  Protein folding rates estimated from contact predictions. , 2005, Journal of molecular biology.

[35]  M. Gromiha,et al.  Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. , 1999, Biophysical chemistry.

[36]  Liang-Tsung Huang,et al.  iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations , 2007, Bioinform..

[37]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[38]  M. Michael Gromiha,et al.  A Statistical Model for Predicting Protein Folding Rates from Amino Acid Sequence with Structural Class Information , 2005, J. Chem. Inf. Model..

[39]  Cristian Micheletti,et al.  Prediction of folding rates and transition‐state placement from native‐state geometry , 2002, Proteins.

[40]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[41]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[42]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[43]  Jitao Huang,et al.  Amino acid sequence predicts folding rate for middle‐size two‐state proteins , 2006, Proteins.

[44]  M Michael Gromiha,et al.  Inter-residue interactions in protein folding and stability. , 2004, Progress in biophysics and molecular biology.

[45]  Michael I. Jordan,et al.  Active site prediction using evolutionary and structural information , 2010, Bioinform..

[46]  Liang-Tsung Huang,et al.  Analysis and prediction of protein folding rates using quadratic response surface models , 2008, J. Comput. Chem..

[47]  Akinori Sarai,et al.  Thermodynamic database for protein-nucleic acid interactions (ProNIT) , 2001, Bioinform..

[48]  D. Baker,et al.  Contact order, transition state placement and the refolding rates of single domain proteins. , 1998, Journal of molecular biology.

[49]  S. Jackson,et al.  How do small single-domain proteins fold? , 1998, Folding & design.

[50]  Gemma L. Holliday,et al.  Understanding the functional roles of amino acid residues in enzyme catalysis. , 2009, Journal of molecular biology.

[51]  Motohisa Oobatake,et al.  Hydration and heat stability effects on protein unfolding , 1993 .

[52]  A. Finkelstein,et al.  Prediction of protein folding rates from the amino acid sequence-predicted secondary structure , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[53]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[54]  Lukasz A. Kurgan,et al.  Prediction of protein folding rates from primary sequences using hybrid sequence representation , 2009, J. Comput. Chem..

[55]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[56]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[57]  Bin-Guang Ma,et al.  Direct correlation between proteins' folding rates and their amino acid compositions: An ab initio folding rate prediction , 2006, Proteins.

[58]  Akinori Sarai,et al.  ProTherm: Thermodynamic Database for Proteins and Mutants , 1999, Nucleic Acids Res..

[59]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[60]  M. Gromiha,et al.  Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations. , 1999, Protein engineering.

[61]  Burkhard Rost,et al.  SNAP predicts effect of mutations on protein function , 2008, Bioinform..

[62]  András Kocsor,et al.  ROC analysis: applications to the classification of biological sequences and 3D structures , 2008, Briefings Bioinform..

[63]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..