Bayesian Segmentation of Protein Secondary Structure

We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for alpha-helices, beta-strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting efficient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide significant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.

[1]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[2]  Chris Sander,et al.  Pedestrian guide to analyzing sequence databases , 1997 .

[3]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[4]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[5]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[6]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[7]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[9]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  G. Rose,et al.  Helix signals in proteins. , 1988, Science.

[12]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[13]  Douglas Poland,et al.  Theory of helix-coil transitions in biopolymers , 1970 .

[14]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[15]  W R Taylor,et al.  A model recognition approach to the prediction of all-helical membrane protein structure and topology. , 1994, Biochemistry.

[16]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  I E Auger,et al.  Algorithms for the optimal identification of segment neighborhoods. , 1989, Bulletin of mathematical biology.

[18]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[19]  David B. Searls,et al.  From Jabberwocky to Genome: Lewis Carroll and Computational Biology , 2001, J. Comput. Biol..

[20]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[21]  Jun S. Liu,et al.  Markovian structures in biological sequence alignments , 1999 .

[22]  R. L. Baldwin,et al.  N‐ and C‐capping preferences for all 20 amino acids in α‐helical peptides , 1995, Protein science : a publication of the Protein Society.

[23]  G J Barton,et al.  Protein secondary structure prediction. , 1995, Current opinion in structural biology.

[24]  C Sander,et al.  Specific recognition in the tertiary structure of beta-sheets of proteins. , 1980, Journal of molecular biology.

[25]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[26]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[27]  Scott R. Presnell,et al.  A segment-based approach to protein secondary structure prediction. , 1991, Biochemistry.

[28]  R. L. Baldwin,et al.  Determination of free energies of N-capping in alpha-helices by modification of the Lifson-Roig helix-coil therapy to include N- and C-capping. , 1994, Biochemistry.

[29]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[30]  Peter J. Munson,et al.  Protein Secondary Structure Prediction using Periodic-Quadratic-Logistic Models: Statistical and Theoretical Issue , 1994, HICSS.

[31]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[32]  R. Aurora,et al.  Helix capping , 1998, Protein science : a publication of the Protein Society.

[33]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[34]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[35]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[36]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[37]  Victor V. Solovyev,et al.  Predicting alpha-helix and beta-strand segments of globular proteins , 1994, Comput. Appl. Biosci..

[38]  Jaap Heringa,et al.  OBSTRUCT: a program to obtain largest cliques from a protein sequence set according to structural resolution and sequence similarity , 1992, Comput. Appl. Biosci..

[39]  D. Brutlag,et al.  Discovering structural correlations in α‐helices , 1994 .

[40]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[41]  C. Sander,et al.  Specific recognition in the tertiary structure of β-sheets of proteins , 1980 .

[42]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[43]  S. K. Riis,et al.  Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. , 1996, Journal of computational biology : a journal of computational molecular cell biology.

[44]  C. Lawrence,et al.  Algorithms for the optimal identification of segment neighborhoods , 1989 .

[45]  R. M. Abarbanel,et al.  Turn prediction in proteins using a pattern-matching approach. , 1986, Biochemistry.

[46]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[47]  J R Gunn,et al.  Computational studies of protein folding. , 1996, Annual review of biophysics and biomolecular structure.

[48]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[49]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[50]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[51]  Charles E. Lawrence,et al.  Uni? ed Gibbs method for biological sequence analysis , 1996 .

[52]  J. Garnier,et al.  Improving protein secondary structure prediction with aligned homologous sequences , 1996, Protein science : a publication of the Protein Society.

[53]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[54]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[55]  L. Regan,et al.  Guidelines for Protein Design: The Energetics of β Sheet Side Chain Interactions , 1995, Science.

[56]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[57]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[58]  Collin M. Stultz,et al.  Structural analysis based on state‐space modeling , 1993, Protein science : a publication of the Protein Society.

[59]  B. Rost,et al.  Topology prediction for helical transmembrane proteins at 86% accuracy–Topology prediction at 86% accuracy , 1996, Protein science : a publication of the Protein Society.

[60]  C. Vinson,et al.  A thermodynamic scale for leucine zipper stability and dimerization specificity: e and g interhelical interactions. , 1994, The EMBO journal.

[61]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.

[62]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[63]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[64]  B Honig,et al.  An algorithm to generate low-resolution protein tertiary structures from knowledge of secondary structure. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[65]  V. Di Francesco,et al.  Protein secondary structure prediction using periodic-quadratic-logistic models: statistical and theoretical issues , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[66]  A. Doig,et al.  Addition of side chain interactions to modified Lifson‐Roig helix‐coil theory: Application to energetics of Phenylalanine‐Methionine interactions , 1995, Protein science : a publication of the Protein Society.

[67]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[68]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[69]  Anders Krogh,et al.  Improving Predicition of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments , 1996, J. Comput. Biol..

[70]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[71]  S A Benner,et al.  Predicting the conformation of proteins from sequences. Progress and future progress. , 1994, Advances in enzyme regulation.

[72]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[73]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[74]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[75]  Douglas L. Brutlag,et al.  Statistical models and monte carlo methods for protein structure prediction , 2002 .