The language of RNA: a formal grammar that includes pseudoknots

MOTIVATION In a previous paper, we presented a polynomial time dynamic programming algorithm for predicting optimal RNA secondary structure including pseudoknots. However, a formal grammatical representation for RNA secondary structure with pseudoknots was still lacking. RESULTS Here we show a one-to-one correspondence between that algorithm and a formal transformational grammar. This grammar class encompasses the context-free grammars and goes beyond to generate pseudoknotted structures. The pseudoknot grammar avoids the use of general context-sensitive rules by introducing a small number of auxiliary symbols used to reorder the strings generated by an otherwise context-free grammar. This formal representation of the residue correlations in RNA structure is important because it means we can build full probabilistic models of RNA secondary structure, including pseudoknots, and use them to optimally parse sequences in polynomial time.

[1]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[2]  Alfred V. Aho Indexed Grammars-An Extension of Context Free Grammars , 1967, SWAT.

[3]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[4]  C. E. Longfellow,et al.  Improved parameters for prediction of RNA structure. , 1987, Cold Spring Harbor symposia on quantitative biology.

[5]  David J. Weir,et al.  The convergence of mildly context-sensitive grammar formalisms , 1990 .

[6]  J. Abrahams,et al.  Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. , 1990, Nucleic acids research.

[7]  Stuart M. Shieber,et al.  Foundational issues in natural language processing , 1991 .

[8]  L. Gold,et al.  RNA pseudoknots that inhibit human immunodeficiency virus type 1 reverse transcriptase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  E. Dam,et al.  Structural and functional aspects of RNA pseudoknots. , 1992, Biochemistry.

[10]  David B. Searls,et al.  The Linguistics of DNA , 1992 .

[11]  T. Cech 11 Structure and Mechanism of the Large Catalytic RNAs: Group I and Group II Introns and Ribonuclease P , 1993 .

[12]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[13]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[14]  C. Pleij,et al.  An APL-programmed genetic algorithm for the prediction of RNA secondary structure. , 1995, Journal of theoretical biology.

[15]  David B. Searls,et al.  String Variable Grammar: A Logic Grammar Formalism for the Biological Language of DNA , 1995, J. Log. Program..

[16]  C. Pleij,et al.  The computer simulation of RNA folding pathways using a genetic algorithm. , 1995, Journal of molecular biology.

[17]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[18]  Fabrice Lefebvre,et al.  A Grammar-Based Unification of Several Alignment and Folding Algorithms , 1996, ISMB.

[19]  M Brown,et al.  RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  L. Gold,et al.  Bent pseudoknots and novel RNA inhibitors of type 1 human immunodeficiency virus (HIV-1) reverse transcriptase. , 1996, Journal of molecular biology.

[21]  A. Ferré-D’Amaré,et al.  Crystal structure of a hepatitis delta virus ribozyme , 1998, Nature.

[22]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[23]  David B. Searls Formal language theory and biological macromolecules , 1998, Mathematical Support for Molecular Biology.

[24]  C. W. Hilbers,et al.  NMR structure of a classical pseudoknot: interplay of single- and double-stranded RNA. , 1998, Science.

[25]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[26]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[27]  A. Sonnenberg,et al.  Structural and functional aspects of filamins. , 2001, Biochimica et biophysica acta.