Dynamic Programming Algorithms and Grammatical Modeling for Protein Beta-Sheet Prediction

Protein structure prediction is one major task in computational biology, and various methods have been proposed for secondary structure prediction. In particular, it is a challenge to predict beta-sheet structures since they range over several discontinuous regions in an amino acid sequence. In this article, we propose dynamic programming algorithms for several kinds of antiparallel beta-sheets, where the proposed approach can be extended for more general classes of beta-sheets. Experimental results for real data show that our prediction algorithms have good performance in accuracy. We also show a relation between the proposed algorithms and an existing grammar-based method, and present another grammatical modeling. Furthermore, we prove that prediction of planar beta-sheet structures is NP-hard.

[1]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[2]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[3]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[4]  S H Kim,et al.  Environment-dependent residue contact energies for proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[6]  Satoshi Kobayashi,et al.  Tree Adjoining Grammars for RNA Structure Prediction , 1999, Theor. Comput. Sci..

[7]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[8]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[9]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[10]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[11]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[12]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[13]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[14]  Stephen Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning , 1992 .

[15]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[16]  David B. Searls,et al.  Grammatical Representations of Macromolecular Structure , 2006, J. Comput. Biol..

[17]  Tadao Kasami,et al.  On Multiple Context-Free Grammars , 1991, Theor. Comput. Sci..

[18]  Naoki Abe,et al.  Predicting Protein Secondary Structure Using Stochastic Tree Grammars , 1997, Machine Learning.

[19]  Russell L. Malmberg,et al.  Stochastic modeling of RNA pseudoknotted structures: a grammatical approach , 2003, ISMB.

[20]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[21]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[22]  Satoru Miyano,et al.  On the Approximation of Protein Threading , 1999, Theor. Comput. Sci..

[23]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[24]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[25]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[26]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.