Predicting intrinsic disorder from amino acid sequence

Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61‐residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. Proteins 2003;53:566–572. © 2003 Wiley‐Liss, Inc.

[1]  A. Dunker,et al.  Proposed molten globule intermediates in fd phage penetration and assembly , 1991, FEBS letters.

[2]  John Moult,et al.  Evaluation of disorder predictions in CASP5 , 2003, Proteins.

[3]  Obradovic,et al.  Predicting Protein Disorder for N-, C-, and Internal Regions. , 1999, Genome informatics. Workshop on Genome Informatics.

[4]  Insertion of bacteriophage m13 coat protein into membranes. , 1982, Biophysical journal.

[5]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[6]  B P Gaber,et al.  NRL-3D: a sequence-structure database derived from the protein data bank (PDB) and searchable within the PIR environment. , 1990, Protein sequences & data analysis.

[7]  O. Ptitsyn,et al.  α‐lactalbumin: compact state with fluctuating tertiary structure? , 1981, FEBS letters.

[8]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[9]  O. Ptitsyn,et al.  Alpha-Lactalbumin: compact state with fluctuating tertiary structure? , 1981, FEBS letters.

[10]  R. J. Williams The conformational mobility of proteins and its functional significance. , 1978, Biochemical Society transactions.

[11]  A.K. Dunker,et al.  Identifying disordered regions in proteins from amino acid sequence , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[12]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[13]  Obradovic,et al.  Predicting Binding Regions within Disordered Proteins. , 1999, Genome informatics. Workshop on Genome Informatics.

[14]  Z. Obradovic,et al.  Identification and functions of usefully disordered proteins. , 2002, Advances in protein chemistry.

[15]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[16]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[17]  J. Griffith,et al.  Association of M13 I-forms and spheroids with lipid vesicles. , 1985, Archives of biochemistry and biophysics.

[18]  A model for fd phage penetration and assembly , 1991, FEBS letters.

[19]  L. Iakoucheva,et al.  Intrinsic disorder in cell-signaling and cancer-associated proteins. , 2002, Journal of molecular biology.

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  S. Vucetic,et al.  Flavors of protein disorder , 2003, Proteins.

[22]  Obradovic,et al.  Predicting Disordered Regions from Amino Acid Sequence: Common Themes Despite Differing Structural Characterization. , 1998, Genome informatics. Workshop on Genome Informatics.

[23]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[24]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[25]  A. Wada,et al.  ‘Molten‐globule state’: a compact form of globular proteins with mobile side‐chains , 1983, FEBS letters.

[26]  Zoran Obradovic,et al.  Prediction of Boundaries Between Intrinsically Ordered and Disordered Protein Regions , 2002, Pacific Symposium on Biocomputing.

[27]  Christopher J. Oldfield,et al.  Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions , 2002, Journal of Molecular Evolution.

[28]  Zoran Obradovic,et al.  The protein trinity—linking function and disorder , 2001, Nature Biotechnology.

[29]  V. Uversky Intrinsically Disordered Proteins , 2000 .

[30]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[31]  P. Radivojac,et al.  Improved amino acid flexibility parameters , 2003, Protein science : a publication of the Protein Society.

[32]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[33]  P. Lansbury,et al.  NACP, a protein implicated in Alzheimer's disease and learning, is natively unfolded. , 1996, Biochemistry.