Statistically Based Reduced Representation of Amino Acid Side Chains

Preferred conformations of amino acid side chains have been well established through statistically obtained rotamer libraries. Typically, these provide bond torsion angles allowing a side chain to be traced atom by atom. In cases where it is desirable to reduce the complexity of a protein representation or prediction, fixing all side-chain atoms may prove unwieldy. Therefore, we introduce a general parametrization to allow positions of representative atoms (in the present study, these are terminal atoms) to be predicted directly given backbone atom coordinates. Using a large, culled data set of amino acid residues from high-resolution protein crystal structures, anywhere from 1 to 7 preferred conformations were observed for each terminal atom of the non-glycine residues. Side-chain length from the backbone C(alpha) is one of the parameters determined for each conformation, which should itself be useful. Prediction of terminal atoms was then carried out for a second, nonredundant set of protein structures to validate the data set. Using four simple probabilistic approaches, the Monte Carlo style prediction of terminal atom locations given only backbone coordinates produced an average root mean-square deviation (RMSD) of approximately 3 A from the experimentally determined terminal atom positions. With prediction using conditional probabilities based on the side-chain chi(1) rotamer, this average RMSD was improved to 1.74 A. The observed terminal atom conformations therefore provide reasonable and potentially highly accurate representations of side-chain conformation, offering a viable alternative to existing all-atom rotamers for any case where reduction in protein model complexity, or in the amount of data to be handled, is desired. One application of this representation with strong potential is the prediction of charge density in proteins. This would likely be especially valuable on protein surfaces, where side chains are much less likely to be fixed in single rotamers. Prediction of ensembles of structures provides a method to determine the probability density of charge and atom location; such a prediction is demonstrated graphically.

[1]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[2]  William R. Taylor,et al.  An ellipsoidal approximation of protein shape , 1983 .

[3]  C. Brooks,et al.  Virtual rigid body dynamics , 1991, Biopolymers.

[4]  N Gibbs,et al.  Ab initio protein structure prediction using physicochemical potentials and a simplified off‐lattice model , 2001, Proteins.

[5]  P Herzyk,et al.  A reduced representation of proteins for use in restraint satisfaction calculations , 1993, Proteins.

[6]  M. Prabhakaran,et al.  Shape and surface features of globular proteins , 1982 .

[7]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[8]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[9]  T. Creighton Proteins: Structures and Molecular Properties , 1986 .

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[12]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[13]  M. Philippopoulos,et al.  Exploring the dynamic information content of a protein NMR structure: Comparison of a molecular dynamics simulation with the NMR and X‐ray structures of Escherichia coli ribonuclease HI , 1999, Proteins.

[14]  B. Todd,et al.  Connecting nanoscale images of proteins with their genetic sequences. , 2003, Biophysical journal.

[15]  M. Zalis,et al.  Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[16]  P E Wright,et al.  Recommendations for the presentation of NMR structures of proteins and nucleic acids. , 1998, Journal of molecular biology.

[17]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[18]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[19]  Martin Zacharias,et al.  Protein–protein docking with a reduced protein model accounting for side‐chain flexibility , 2003, Protein science : a publication of the Protein Society.

[20]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[21]  W R Taylor,et al.  Location of ‘continuous’ antigenic determinants in the protruding regions of proteins. , 1986, The EMBO journal.

[22]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[23]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[24]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[25]  Gerard J. Kleywegt,et al.  Validation of protein crystal structures , 2006 .

[26]  S. Doniach,et al.  A computer model to dynamically simulate protein folding: Studies with crambin , 1989, Proteins.

[27]  S. Sun,et al.  Reduced representation model of protein structure prediction: Statistical potential and genetic algorithms , 1993, Protein science : a publication of the Protein Society.

[28]  S A Benner,et al.  Protein Structure Prediction , 1996, Science.

[29]  B. Sykes,et al.  The Role of Side Chain Conformational Flexibility in Surface Recognition by Tenebrio Molitor Antifreeze Protein , 2003 .

[30]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[31]  D. Pal,et al.  The interrelationships of side-chain and main-chain conformations in proteins. , 2001, Progress in biophysics and molecular biology.

[32]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[33]  T M Handel,et al.  Review: protein design--where we were, where we are, where we're going. , 2001, Journal of structural biology.

[34]  A. Wallqvist,et al.  A simplified amino acid potential for use in structure predictions of proteins , 1994, Proteins.

[35]  K. Dill Polymer principles and protein folding , 1999, Protein science : a publication of the Protein Society.

[36]  C. Hall,et al.  α‐Helix formation: Discontinuous molecular dynamics on an intermediate‐resolution protein model , 2001, Proteins.

[37]  J. Drenth Principles of protein x-ray crystallography , 1994 .

[38]  T. Blundell,et al.  X-ray analyses of aspartic proteinases. II. Three-dimensional structure of the hexagonal crystal form of porcine pepsin at 2.3 A resolution. , 1990, Journal of molecular biology.

[39]  A. Jabs,et al.  Non-proline cis peptide bonds in proteins. , 1999, Journal of molecular biology.

[40]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[41]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.

[42]  J. Thornton,et al.  Stereochemical quality of protein structure coordinates , 1992, Proteins.

[43]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[44]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[45]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[46]  I Bahar,et al.  Packing of sidechains in low-resolution models for proteins. , 1998, Folding & design.

[47]  G. N. Ramachandran,et al.  Studies on the conformation of amino acids. XI. Analysis of the observed side group conformation in proteins. , 2009, International journal of protein research.