Protein flexibility and intrinsic disorder

Comparisons were made among four categories of protein flexibility: (1) low‐B‐factor ordered regions, (2) high‐B‐factor ordered regions, (3) short disordered regions, and (4) long disordered regions. Amino acid compositions of the four categories were found to be significantly different from each other, with high‐B‐factor ordered and short disordered regions being the most similar pair. The high‐B‐factor (flexible) ordered regions are characterized by a higher average flexibility index, higher average hydrophilicity, higher average absolute net charge, and higher total charge than disordered regions. The low‐B‐factor regions are significantly enriched in hydrophobic residues and depleted in the total number of charged residues compared to the other three categories. We examined the predictability of the high‐B‐factor regions and developed a predictor that discriminates between regions of low and high B‐factors. This predictor achieved an accuracy of 70% and a correlation of 0.43 with experimental data, outperforming the 64% accuracy and 0.32 correlation of predictors based solely on flexibility indices. To further clarify the differences between short disordered regions and ordered regions, a predictor of short disordered regions was developed. Its relatively high accuracy of 81% indicates considerable differences between ordered and disordered regions. The distinctive amino acid biases of high‐B‐factor ordered regions, short disordered regions, and long disordered regions indicate that the sequence determinants for these flexibility categories differ from one another, whereas the significantly‐greater‐than‐chance predictability of these categories from sequence suggest that flexible ordered regions, short disorder, and long disorder are, to a significant degree, encoded at the primary structure level.

[1]  R. J. Williams The conformational mobility of proteins and its functional significance. , 1978, Biochemical Society transactions.

[2]  R J Williams,et al.  THE CONFORMATION PROPERTIES OF PROTEINS IN SOLUTION , 1979, Biological reviews of the Cambridge Philosophical Society.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[5]  W A Hendrickson,et al.  Influence of solvent accessibility and intermolecular contacts on atomic mobilities in hemerythrins. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[6]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[7]  G A Petsko,et al.  Study of protein dynamics by X-ray diffraction. , 1986, Methods in enzymology.

[8]  J. L. Smith,et al.  Structural heterogeneity in protein crystals. , 1986, Biochemistry.

[9]  Dagmar Ringe,et al.  [19]Study of protein dynamics by X-ray diffraction , 1986 .

[10]  M Vihinen,et al.  Relationship of protein flexibility to thermostability. , 1987, Protein engineering.

[11]  A Facchiano,et al.  Flexibility plot of proteins. , 1989, Protein engineering.

[12]  G. Phillips,et al.  Comparison of the dynamics of myoglobin in different crystal forms. , 1990, Biophysical journal.

[13]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Mark A. Cohen,et al.  Correct structure prediction? , 1992, Nature.

[15]  Gale Rhodes,et al.  Crystallography made crystal clear : a guide for users ofmacromolecular models , 1993 .

[16]  P. Argos,et al.  Quantification of secondary structure prediction improvement using multiple alignments. , 1993, Protein engineering.

[17]  E. Shakhnovich,et al.  Engineering of stable and fast-folding sequences of model proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[19]  O. Ptitsyn,et al.  The molten globule is a third thermodynamical state of protein molecules , 1994, FEBS letters.

[20]  R. Altman,et al.  Compositional characteristics of disordered regions in proteins , 1996 .

[21]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[22]  John E. Wampler,et al.  Distribution Analysis of the Variation of B-Factors of X-ray Crystal Structures: Temperature and Structural Variations in Lysozyme , 1997, J. Chem. Inf. Comput. Sci..

[23]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[24]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[25]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[26]  A.K. Dunker,et al.  Identifying disordered regions in proteins from amino acid sequence , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[27]  Obradovic,et al.  Predicting Disordered Regions from Amino Acid Sequence: Common Themes Despite Differing Structural Characterization. , 1998, Genome informatics. Workshop on Genome Informatics.

[28]  P Argos,et al.  Accessibility to internal cavities and ligand binding sites monitored by protein crystallographic thermal factors , 1998, Proteins.

[29]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[30]  P Argos,et al.  Reliability of atomic displacement parameters in protein crystal structures. , 1999, Acta crystallographica. Section D, Biological crystallography.

[31]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[32]  Ian H. Witten,et al.  Protein is incompressible , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[33]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[34]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[35]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[36]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[37]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[38]  O. Carugo Detection of breaking points in helices linking separate domains , 2001, Proteins.

[39]  H. Dyson,et al.  Coupling of folding and binding for unstructured proteins. , 2002, Current opinion in structural biology.

[40]  V. Uversky Natively unfolded proteins: A point where biology waits for physics , 2002, Protein science : a publication of the Protein Society.

[41]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[42]  Kuo-Chen Chou,et al.  Artificial Neural Network Method for Predicting Protein Secondary Structure Content , 2002, Comput. Chem..

[43]  Vladimir N Uversky,et al.  What does it mean to be natively unfolded? , 2002, European journal of biochemistry.

[44]  Christopher J. Oldfield,et al.  Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions , 2002, Journal of Molecular Evolution.

[45]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[46]  B. Halle,et al.  Flexibility and packing in proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Zoran Obradovic,et al.  Improving Sequence Alignments For Intrinsically Disordered Proteins , 2001, Pacific Symposium on Biocomputing.

[48]  G. Phillips,et al.  Dynamics of proteins in crystals: comparison of experiment with simple models. , 2002, Biophysical journal.

[49]  Z. Obradovic,et al.  Identification and functions of usefully disordered proteins. , 2002, Advances in protein chemistry.

[50]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[51]  S. Vucetic,et al.  Flavors of protein disorder , 2003, Proteins.

[52]  P. Radivojac,et al.  Improved amino acid flexibility parameters , 2003, Protein science : a publication of the Protein Society.

[53]  Zoran Obradovic,et al.  Predicting intrinsic disorder from amino acid sequence , 2003, Proteins.

[54]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[55]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[56]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[57]  P. Karplus,et al.  Prediction of chain flexibility in proteins , 1985, Naturwissenschaften.

[58]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..