Prediction of solvent accessibility and sites of deleterious mutations from protein sequence

Residues that form the hydrophobic core of a protein are critical for its stability. A number of approaches have been developed to classify residues as buried or exposed. In order to optimize the classification, we have refined a suite of five methods over a large dataset and proposed a metamethod based on an ensemble average of the individual methods, leading to a two-state classification accuracy of 80%. Many studies have suggested that hydrophobic core residues are likely sites of deleterious mutations, so we wanted to see to what extent these sites can be predicted from the putative buried residues. Residues that were most confidently classified as buried were proposed as sites of deleterious mutations. This proposition was tested on six proteins for which sites of deleterious mutations have previously been identified by stability measurement or functional assay. Of the total of 130 residues predicted as sites of deleterious mutations, 104 (or 80%) were correct.

[1]  A. Shrake,et al.  Environment and exposure to solvent of protein atoms. Lysozyme and insulin. , 1973, Journal of molecular biology.

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  R. Sauer,et al.  Mutations in lambda repressor's amino-terminal domain: implications for protein stability and DNA binding. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Marianne Manchester,et al.  Complete mutagenesis of the HIV-1 protease , 1989, Nature.

[5]  W E Stites,et al.  Contributions of the large hydrophobic amino acids to the stability of staphylococcal nuclease. , 1990, Biochemistry.

[6]  K. Dill,et al.  Origins of structure in globular proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S. Bouvier,et al.  Systematic mutation of bacteriophage T4 lysozyme. , 1991, Journal of molecular biology.

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[10]  T C Terwilliger,et al.  Relationship between in vivo activity and in vitro measures of function and stability of a protein. , 1995, Biochemistry.

[11]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[12]  H. A. Nagarajaram,et al.  A procedure for the prediction of temperature-sensitive mutants of a globular protein based solely on the amino acid sequence. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[14]  Jeffrey Miller,et al.  Genetic Studies of Lac Repressor: 4000 Single Amino Acid Substitutions and Analysis of the Resulting Phenotypes on the Basis of the Protein Structure , 1996, German Conference on Bioinformatics.

[15]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[16]  Paul Horton,et al.  Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier , 1997, ISMB.

[17]  Steven Salzberg,et al.  A Decision Tree System for Finding Genes in DNA , 1998, J. Comput. Biol..

[18]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[19]  R C Wade,et al.  Prediction of protein hydration sites from sequence by modular neural networks. , 1998, Protein engineering.

[20]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[21]  J. Nicolas,et al.  A symbolic-numeric approach to find patterns in genomes. Application to the translation initiation sites of E. coli. , 1999, Biochimie.

[22]  Heinz-Theodor Mevissen,et al.  Decision tree-based formation of consensus protein secondary structure prediction , 1999, Bioinform..

[23]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[24]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[25]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[26]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[27]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[28]  Y Shan,et al.  Fold recognition and accurate query‐template alignment by a combination of PSI‐BLAST and threading , 2001, Proteins.

[29]  Xian-Ming Pan,et al.  New method for accurate prediction of solvent accessibility from protein sequence , 2001, Proteins.

[30]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[31]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[32]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[33]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[34]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[35]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[36]  Christopher T. Saunders,et al.  Evaluation of structural and evolutionary contributions to deleterious mutation prediction. , 2002, Journal of molecular biology.

[37]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[38]  Xiaohua Hu,et al.  Classification Comparison of Prediction of Solvent Accessibility From Protein Sequences , 2004, APBC.

[39]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[40]  B. Rost How to Use Protein 1- D Structure Predicted by PROFphd , 2005 .

[41]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.