On the properties and sequence context of structurally ambivalent fragments in proteins

The goal of this work is to characterize structurally ambivalent fragments in proteins. We have searched the Protein Data Bank and identified all structurally ambivalent peptides (SAPs) of length five or greater that exist in two different backbone conformations. The SAPs were classified in five distinct categories based on their structure. We propose a novel index that provides a quantitative measure of conformational variability of a sequence fragment. It measures the context‐dependent width of the distribution of (ϕ,ξ) dihedral angles associated with each amino acid type. This index was used to analyze the local structural propensity of both SAPs and the sequence fragments contiguous to them. We also analyzed type‐specific amino acid composition, solvent accessibility, and overall structural properties of SAPs and their sequence context. We show that each type of SAP has an unusual, type‐specific amino acid composition and, as a result, simultaneous intrinsic preferences for two distinct types of backbone conformation. All types of SAPs have lower sequence complexity than average. Fragments that adopt helical conformation in one protein and sheet conformation in another have the lowest sequence complexity and are sampled from a relatively limited repertoire of possible residue combinations. A statistically significant difference between two distinct conformations of the same SAP is observed not only in the overall structural properties of proteins harboring the SAP but also in the properties of its flanking regions and in the pattern of solvent accessibility. These results have implications for protein design and structure prediction.

[1]  S. Rackovsky,et al.  Discriminative ability with respect to amino acid types: Assessing the performance of knowledge‐based potentials without threading , 2002, Proteins.

[2]  S. Dalal,et al.  Understanding the sequence determinants of conformational switching using protein design , 2000, Protein science : a publication of the Protein Society.

[3]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[4]  C Sander,et al.  On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[6]  P. S. Kim,et al.  Context-dependent secondary structure formation of a designed protein sequence , 1996, Nature.

[7]  B. Rost,et al.  Loopy proteins appear conserved in evolution. , 2002, Journal of molecular biology.

[8]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[9]  M Mezei,et al.  Chameleon sequences in the PDB. , 1998, Protein engineering.

[10]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[11]  M. Gerstein,et al.  A database of macromolecular motions. , 1998, Nucleic acids research.

[12]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[13]  S. Wodak,et al.  Extracting information on folding from the amino acid sequence: accurate predictions for protein regions with preferred conformation in the absence of tertiary interactions. , 1992, Biochemistry.

[14]  Malin M. Young,et al.  Predicting conformational switches in proteins , 1999, Protein science : a publication of the Protein Society.

[15]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[16]  K. Fidelis,et al.  Comparison of systematic search and database methods for constructing segments of protein structure. , 1994, Protein engineering.

[17]  Scott R. Presnell,et al.  Origins of structural diversity within sequentially identical hexapeptides , 1993, Protein science : a publication of the Protein Society.

[18]  M. Swindells,et al.  Intrinsic φ,ψ propensities of amino acids, derived from the coil regions of known structures , 1995, Nature Structural Biology.

[19]  G. Rose,et al.  Is protein folding hierarchic? I. Local structure and peptide folding. , 1999, Trends in biochemical sciences.

[20]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[21]  R L Jernigan,et al.  Short‐range conformational energies, secondary structure propensities, and recognition of correct sequence‐structure matches , 1997, Proteins.

[22]  S. Griffiths-Jones,et al.  Modulation of intrinsic phi,psi propensities of amino acids by neighbouring residues in the coil regions of protein structures: NMR analysis and dissection of a beta-hairpin peptide. , 1998, Journal of molecular biology.

[23]  M. Saqi,et al.  An analysis of structural instances of low complexity sequence segments. , 1995, Protein engineering.

[24]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[25]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[26]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[27]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[28]  Stanley B. Prusiner,et al.  Nobel Lecture: Prions , 1998 .

[29]  S. Rackovsky,et al.  Optimally informative backbone structural propensities in proteins , 2002, Proteins.

[30]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[31]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[32]  Melanie A. Huntley,et al.  Simple sequences are rare in the Protein Data Bank , 2002, Proteins.

[33]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[34]  S Rackovsky,et al.  Optimized representations and maximal information in proteins , 2000, Proteins.

[35]  G Chelvanayagam,et al.  An analysis of the helix‐to‐strand transition between peptides with identical sequence , 2000, Proteins.

[36]  D Baker,et al.  Global properties of the mapping between local amino acid sequence and local structure in proteins. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[37]  J. Kwasigroch,et al.  A global taxonomy of loops in globular proteins. , 1996, Journal of molecular biology.

[38]  P Argos,et al.  Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. , 1987, Journal of molecular biology.

[39]  T. Blundell,et al.  Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modelling. , 1997, Journal of molecular biology.

[40]  I. Jonassen,et al.  Searching the protein structure databank with weak sequence patterns and structural constraints. , 2000, Journal of molecular biology.

[41]  S Rackovsky On the nature of the protein folding code. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[42]  S. Rackovsky Quantitative organization of the known protein x‐ray structures. I. Methods and short‐length‐scale results , 1990, Proteins.

[43]  S. Sudarsanam,et al.  Structural diversity of sequentially identical subsequences of proteins: Identical octapeptides can have different conformations , 1998, Proteins.