Discovering structural correlations in α‐helices

We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to α‐helical and β‐sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely α‐helices and β‐sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3‐dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.

[1]  P. Deas Notes of a Case of Spontaneous Fracture of the Humerus and Femur, Resulting from Degeneration of the Bones , 1877, British medical journal.

[2]  G. Gorry,et al.  Experience with a model of sequential diagnosis. , 2011, Computers and biomedical research, an international journal.

[3]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[4]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[5]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[6]  P. Y. Chou,et al.  Empirical predictions of protein conformation. , 1978, Annual review of biochemistry.

[7]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[8]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[9]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[10]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Kanehisa,et al.  Prediction of protein function from sequence properties. Discriminant analysis of a data base. , 1984, Biochimica et biophysica acta.

[12]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[13]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[14]  P. Lindley,et al.  Sulphur‐aromatic interactions in proteins , 1985 .

[15]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[16]  Barry Robson,et al.  An algorithm for secondary structure determination in proteins based on sequence similarity , 1986, FEBS letters.

[17]  D. G. Swain Computer aided diagnosis of acute abdominal pain , 1986 .

[18]  A. Lesk,et al.  Determinants of a protein fold. Unique features of the globin amino acid sequences. , 1987, Journal of molecular biology.

[19]  M. Sternberg,et al.  Analysis of the relationship between side-chain conformation and secondary structure in globular proteins. , 1987, Journal of molecular biology.

[20]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[21]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[22]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[23]  G. Petsko,et al.  Weakly polar interactions in proteins. , 1988, Advances in protein chemistry.

[24]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[25]  J. Thornton,et al.  Protein motifs and data-base searching. , 1989, Trends in biochemical sciences.

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[27]  R. L. Baldwin,et al.  Unusually stable helix formation in short alanine-based peptides. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[28]  R. L. Baldwin,et al.  Side‐chain interactions in the C‐peptide helix: Phe 8 ⃛ His 12+ , 1990, Biopolymers.

[29]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[30]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[31]  G. Rose,et al.  Side-chain entropy opposes alpha-helix formation but rationalizes experimentally determined helix-forming propensities. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[32]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[33]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[34]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[35]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[36]  M J Sternberg,et al.  Empirical scale of side-chain conformational entropy in protein folding. , 1993, Journal of molecular biology.

[37]  R. L. Baldwin,et al.  Charged histidine affects alpha-helix stability at all positions in the helix by interacting with the backbone charges. , 1993, Proceedings of the National Academy of Sciences of the United States of America.