Stochastic modeling of RNA pseudoknotted structures: a grammatical approach

MOTIVATION Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large increase in complexity. RESULTS We introduce a new grammar modeling approach for RNA pseudoknotted structures based on parallel communicating grammar systems (PCGS). Our new approach can specify pseudoknotted structures, while avoiding context-sensitive rules, using a single CFG synchronized with a number of regular grammars. Technically, the stochastic version of the grammar model can be as simple as an SCFG. As with SCFG, the new approach permits automatic generation of a single-RNA structure prediction algorithm for each specified pseudoknotted structure model. This approach also makes it possible to develop full probabilistic models of pseudoknotted structures to allow the prediction of consensus structures by comparative analysis and structural homology recognition in database searches.

[1]  E Westhof,et al.  Phylogenetic analysis of tmRNA genes within a bacterial subgroup reveals a specific structural signature. , 2001, Nucleic acids research.

[2]  Noam Chomsky,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[3]  Liming Cai,et al.  The Computational Complexity of Linear PCGSs , 1996, Comput. Artif. Intell..

[4]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[5]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[6]  Satoshi Kobayashi,et al.  Tree Adjoining Grammars for RNA Structure Prediction , 1999, Theor. Comput. Sci..

[7]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[8]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[9]  David B. Searls Formal language theory and biological macromolecules , 1998, Mathematical Support for Molecular Biology.

[10]  M Brown,et al.  RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  C. Zwieb,et al.  Comparative sequence analysis of tmRNA. , 1999, Nucleic acids research.

[12]  Chantal Ehresmann,et al.  In Vitro Evidence for a Long Range Pseudoknot in the 5′-Untranslated and Matrix Coding Regions of HIV-1 Genomic RNA* , 2002, The Journal of Biological Chemistry.

[13]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[14]  M. Zuker Computer prediction of RNA structure. , 1989, Methods in enzymology.

[15]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[16]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[17]  Michael P. S. Brown,et al.  Small Subunit Ribosomal RNA Modeling Using Stochastic Context-Free Grammars , 2000, ISMB.

[18]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[19]  David B. Searls,et al.  The Linguistics of DNA , 1992 .

[20]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[21]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[22]  Gheorghe Paun,et al.  Further remarks on parallel communicating grammar systems , 1990, Int. J. Comput. Math..

[23]  Liming Cai,et al.  The Computational Complexity of PCGS with Regular Components , 1995, Developments in Language Theory.

[24]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[25]  Yasuyuki Kurihara,et al.  Imino proton NMR analysis of HDV ribozymes: nested double pseudoknot structure and Mg2+ ion-binding site close to the catalytic core in solution. , 2002, Nucleic acids research.

[26]  Ian Holmes,et al.  Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars , 2001, Pacific Symposium on Biocomputing.