Evaluation of local structure alphabets based on residue burial

Residue burial, which describes a protein residue's exposure to solvent and neighboring atoms, is key to protein structure prediction, modeling, and analysis. We assessed 21 alphabets representing residue burial, according to their predictability from amino acid sequence, conservation in structural alignments, and utility in one fold‐recognition scenario. This follows upon our previous work in assessing nine representations of backbone geometry. 1 The alphabet found to be most effective overall has seven states and is based on a count of Cβ atoms within a 14 Å‐radius sphere centered at the Cβ of a residue of interest. When incorporated into a hidden Markov model (HMM), this alphabet gave us a 38% performance boost in fold recognition and 23% in alignment quality. Proteins 2004. © 2004 Wiley‐Liss, Inc.

[1]  Christopher T. Saunders,et al.  Evaluation of structural and evolutionary contributions to deleterious mutation prediction. , 2002, Journal of molecular biology.

[2]  Melissa S. Cline,et al.  Predicting reliable regions in protein sequence alignments , 2002, Bioinform..

[3]  D. Cooper,et al.  Assessing the relative importance of the biophysical properties of amino acid substitutions associated with human genetic disease , 2002, Human mutation.

[4]  Cyrus Levinthal,et al.  A vectorized algorithm for calculating the accessible surface area of macromolecules , 1991 .

[5]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[6]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[7]  A. Dean,et al.  Enzyme evolution explained (sort of). , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[9]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[10]  A. D. McLachlan,et al.  Solvation energy in protein folding and binding , 1986, Nature.

[11]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[12]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[13]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[14]  Piero Fariselli,et al.  Prediction of the Number of Residue Contacts in Proteins , 2000, ISMB.

[15]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[16]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[17]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[18]  C Sander,et al.  Excluded volume approximation to protein-solvent interaction. The solvent contact model. , 1990, Biophysical journal.

[19]  C. Barrett Investigation Of Non-Pairwise Protein Structure Score Functions Using Sets Of Decoy Structures , 2001 .

[20]  Melissa S. Cline,et al.  ON ALIGNMENT SHIFT AND ITS MEASURES , 1998 .

[21]  D. Haussler,et al.  Information‐theoretic dissection of pairwise contact potentials , 2002, Proteins.

[22]  Zoran Obradovic,et al.  The Protein Non-Folding Problem: Amino Acid Determinants of Intrinsic Order and Disorder , 2000, Pacific Symposium on Biocomputing.

[23]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[24]  K. Karplus,et al.  Evaluating local structure alphabets for protein structure prediction , 2003 .

[25]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[26]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[27]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[28]  D J Barlow,et al.  The bottom line for prediction of residue solvent accessibility. , 1999, Protein engineering.

[29]  K. Nishikawa,et al.  Radial locations of amino acid residues in a globular protein: correlation with the sequence. , 1986, Journal of biochemistry.

[30]  Yael Mandel-Gutfreund,et al.  On the significance of alternating patterns of polar and non-polar residues in beta-strands. , 2002, Journal of molecular biology.

[31]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[32]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[33]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[34]  T. Richmond,et al.  Solvent accessible surface area and excluded volume in proteins. Analytical equations for overlapping spheres and implications for the hydrophobic effect. , 1984, Journal of molecular biology.

[35]  F M Richards,et al.  Areas, volumes, packing and protein structure. , 1977, Annual review of biophysics and bioengineering.

[36]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[37]  Kenneth M. Merz,et al.  Rapid approximation to molecular surface area via the use of Boolean logic and look‐up tables , 1993, J. Comput. Chem..

[38]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[39]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[40]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[41]  A. Shrake,et al.  Environment and exposure to solvent of protein atoms. Lysozyme and insulin. , 1973, Journal of molecular biology.

[42]  C. Chothia Principles that determine the structure of proteins. , 1984, Annual review of biochemistry.

[43]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.

[44]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[45]  Piero Fariselli,et al.  RCNPRED: prediction of the residue co-ordination numbers in proteins , 2001, Bioinform..

[46]  Federico Fogolari,et al.  Amino acid empirical contact energy definitions for fold recognition in the space of contact maps , 2003, BMC Bioinformatics.

[47]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[48]  P E Bourne,et al.  An alternative view of protein fold space , 2000, Proteins.

[49]  J. Janin,et al.  Analytical approximation to the accessible surface area of proteins. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[50]  M J Sippl,et al.  Progress in fold recognition , 1995, Proteins.

[51]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.