The Protein Non-Folding Problem: Amino Acid Determinants of Intrinsic Order and Disorder

To investigate the determinants of protein order and disorder, three primary and one derivative database of intrinsically disordered proteins were compiled. The segments in each primary database were characterized by one of the following: X-ray crystallography, nuclear magnetic resonance (NMR), or circular dichroism (CD). The derivative database was based on homology. The three primary disordered databases have a combined total of 157 proteins or segments of length.30 with 18,010 residues, while the derivative database contains 572 proteins from 32 families with 52,688 putatively disordered residues. For the four disordered databases, the amino acid compositions were compared with those from a database of ordered structure. Relative to the ordered protein, the intrinsically disordered segments in all four databases were significantly depleted in W, C, F, I, Y, V, L and N, significantly enriched in A, R, G, Q, S, P, E and K, and inconsistently different in H, M, T, and D, suggesting that the first set be called order-promoting and the second set disorder-promoting. Also, 265 amino acid properties were ranked by their ability to discriminate order and disorder and then pruned to remove the most highly correlated pairs. The 10 highest-ranking properties after pruning consisted of 2 residue contact scales, 4 hydrophobicity scales, 3 scales associated with.-sheets and one polarity scale. Using these 10 properties for comparisons of the 3 primary databases suggests that disorder in all 3 databases is very similar, but with those characterized by NMR and CD being the most similar, those by CD and X-ray being next, and those by NMR and X-ray being the least similar.

[1]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[2]  R. S. Spolar,et al.  Coupling of local folding to site-specific binding of proteins to DNA. , 1994, Science.

[3]  T E Creighton The protein folding problem. , 1988, Science.

[4]  Obradovic,et al.  Predicting Protein Disorder for N-, C-, and Internal Regions. , 1999, Genome informatics. Workshop on Genome Informatics.

[5]  H. Guy Amino acid side-chain partition energies and distribution of residues in soluble proteins. , 1985, Biophysical journal.

[6]  J. Thompson,et al.  Multiple sequence alignment with Clustal X. , 1998, Trends in biochemical sciences.

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[9]  Romero,et al.  Sequence Data Analysis for Long Disordered Regions Prediction in the Calcineurin Family. , 1997, Genome informatics. Workshop on Genome Informatics.

[10]  K. Nishikawa,et al.  Radial locations of amino acid residues in a globular protein: correlation with the sequence. , 1986, Journal of biochemistry.

[11]  D. Eisenberg,et al.  Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. , 1983, Journal of molecular biology.

[12]  A. Gingras,et al.  4E binding proteins inhibit the translation factor eIF4E without folded structure. , 1998, Biochemistry.

[13]  V. Muñoz,et al.  Intrinsic secondary structure propensities of the amino acids, using statistical phi-psi matrices: comparison with experimental scales. , 1994, Proteins.

[14]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: II. , 1979, Biochimica et biophysica acta.

[15]  K J Wilson,et al.  The behaviour of peptides on reverse-phase supports during high-pressure liquid chromatography. , 1981, The Biochemical journal.

[16]  Obradovic,et al.  Predicting Disordered Regions from Amino Acid Sequence: Common Themes Despite Differing Structural Characterization. , 1998, Genome informatics. Workshop on Genome Informatics.

[17]  Obradovic,et al.  Predicting Binding Regions within Disordered Proteins. , 1999, Genome informatics. Workshop on Genome Informatics.

[18]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[19]  Gary W. Daughdrill,et al.  The C-terminal half of the anti-sigma factor, FlgM, becomes structured when bound to its target, σ28 , 1997, Nature Structural Biology.

[20]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[21]  Annabel E. Todd,et al.  From protein structure to function. , 1999, Current opinion in structural biology.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[24]  A.K. Dunker,et al.  Identifying disordered regions in proteins from amino acid sequence , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[25]  P Argos,et al.  Protein secondary structure. Studies on the limits of prediction accuracy. , 2009, International journal of peptide and protein research.

[26]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: III. , 1979, Biochimica et biophysica acta.

[27]  V. Muñoz,et al.  Intrinsic secondary structure propensities of the amino acids, using statistical ϕ–ψ matrices: Comparison with experimental scales , 1994 .

[28]  Kevin W. Plaxco,et al.  The importance of being unfolded , 1997, Nature.

[29]  P. Lansbury,et al.  NACP, a protein implicated in Alzheimer's disease and learning, is natively unfolded. , 1996, Biochemistry.

[30]  D Eisenberg,et al.  Chicken prion tandem repeats form a stable, protease-resistant domain. , 1999, Biochemistry.

[31]  A. Dunker,et al.  Use of conditional probabilities for determining relationships between amino acid sequence and protein secondary structure , 1992, Proteins.

[32]  A K Dunker,et al.  Thousands of proteins likely to have long disordered regions. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[33]  Obradovic,et al.  The Sequence Attribute Method for Determining Relationships Between Sequence and Protein Disorder. , 1998, Genome informatics. Workshop on Genome Informatics.