The Sequence Attribute Method for Determining Relationships Between Sequence and Protein Disorder.

The conditional probability, P(s|x), is a statement of the probability that the event, s, will occur given prior knowledge for the value of x. If x is given and if s is randomly distributed, then an empirical approximation of the true conditional probability can be computed by the application of Bayes' Theorem. Here s represents one of two structural classes, either ordered, s (o), or disordered, s (d), and x represents an attribute value calculated over a window of 21 amino acids. Plots of P(s|x) versus x provide information about the correlation between the given sequence attribute and disorder or order. These conditional probability plots allow quantitative comparisons between individual attributes for their ability to discriminate between order and disorder states. Using such quantitative comparisons, 38 different sequence attributes have been rank-ordered. Attributes based on cysteine, the aromatics, flexible tendencies, and charge were found to be the best attributes for distinguishing order and disorder among those tested so far.

[1]  G A Petsko,et al.  Aromatic-aromatic interaction: a mechanism of protein structure stabilization. , 1985, Science.

[2]  R. Kaptein,et al.  Structure and dynamics of the DNA binding protein HU from Bacillus stearothermophilus by NMR spectroscopy , 1996, Biopolymers.

[3]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[4]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[5]  David Eisenberg,et al.  The helical hydrophobic moment: a measure of the amphiphilicity of a helix , 1982, Nature.

[6]  A.K. Dunker,et al.  Identifying disordered regions in proteins from amino acid sequence , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[7]  G. Chang,et al.  Crystal Structure of the Lactose Operon Repressor and Its Complexes with DNA and Inducer , 1996, Science.

[8]  R. Doolittle,et al.  Of urfs and orfs , 1986 .

[9]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[10]  A. Dunker,et al.  Use of conditional probabilities for determining relationships between amino acid sequence and protein secondary structure , 1992, Proteins.

[11]  G. Chang,et al.  Crystal Structure of the Lactose Operon Repressor and Its Complexes with DNA and Inducer , 1996, Science.

[12]  A K Dunker,et al.  Thousands of proteins likely to have long disordered regions. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[14]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[15]  George D. Rose,et al.  Prediction of chain turns in globular proteins on a hydrophobic basis , 1978, Nature.

[16]  R. S. Spolar,et al.  Coupling of local folding to site-specific binding of proteins to DNA. , 1994, Science.

[17]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[18]  A K Dunker,et al.  Protein disorder and the evolution of molecular recognition: theory, predictions and observations. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[19]  Georg E. Schulz,et al.  Nucleotide binding proteins , 1979 .

[20]  John C. Wootton,et al.  Sequences with ‘unusual’ amino acid compositions , 1994 .

[21]  Drew McDermott,et al.  Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.