A dynamic programming algorithm for RNA structure prediction including pseudoknots.

We describe a dynamic programming algorithm for predicting optimal RNA secondary structure, including pseudoknots. The algorithm has a worst case complexity of O(N6) in time and O(N4) in storage. The description of the algorithm is complex, which led us to adopt a useful graphical representation (Feynman diagrams) borrowed from quantum field theory. We present an implementation of the algorithm that generates the optimal minimum energy structure for a single RNA sequence, using standard RNA folding thermodynamic parameters augmented by a few parameters describing the thermodynamic stability of pseudoknots. We demonstrate the properties of the algorithm by using it to predict structures for several small pseudoknotted and non-pseudoknotted RNAs. Although the time and memory demands of the algorithm are steep, we believe this is the first algorithm to be able to fold optimal (minimum energy) pseudoknotted RNAs with the accepted RNA thermodynamic model.

[1]  C. Pleij,et al.  The computer simulation of RNA folding pathways using a genetic algorithm. , 1995, Journal of molecular biology.

[2]  J. Ebel,et al.  The tRNA‐like structure of turnip yellow mosaic virus RNA: structural organization of the last 159 nucleotides from the 3′ OH terminus , 1982, The EMBO journal.

[3]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[4]  J. Abrahams,et al.  Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. , 1990, Nucleic acids research.

[5]  D. Draper,et al.  Thermodynamics of folding a pseudoknotted mRNA fragment. , 1994, Journal of molecular biology.

[6]  W. Wooster,et al.  Crystal structure of , 2005 .

[7]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[8]  Jack Edmonds,et al.  Maximum matching and a polyhedron with 0,1-vertices , 1965 .

[9]  I. Tinoco,et al.  RNA pseudoknots. Stability and loop size requirements. , 1990, Journal of molecular biology.

[10]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[11]  M. Huynen,et al.  Assessing the reliability of RNA folding using statistical mechanics. , 1997, Journal of molecular biology.

[12]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[13]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  C. W. Hilbers,et al.  NMR structure of a classical pseudoknot: interplay of single- and double-stranded RNA. , 1998, Science.

[15]  J P Abrahams,et al.  Five pseudoknots are present at the 204 nucleotides long 3' noncoding region of tobacco mosaic virus RNA. , 1985, Nucleic acids research.

[16]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[17]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[18]  M Brown,et al.  RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[19]  Neocles B. Leontis,et al.  Molecular modeling of nucleic acids , 1997 .

[20]  A. Ferré-D’Amaré,et al.  Crystal structure of a hepatitis delta virus ribozyme , 1998, Nature.

[21]  D. Higgins,et al.  RAGA: RNA sequence alignment by genetic algorithm. , 1997, Nucleic acids research.

[22]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[23]  P. Ahlquist,et al.  Near identity of 3′ RNA secondary structure in bromoviruses and cucumber mosaic virus , 1981, Cell.

[24]  D. Turner,et al.  Improved free-energy parameters for predictions of RNA duplex stability. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[25]  C. Pleij,et al.  3-D graphics modelling of the tRNA-like 3'-end of turnip yellow mosaic virus RNA: structural and functional implications. , 1987, Journal of biomolecular structure & dynamics.

[26]  Fabrice Lefebvre,et al.  A Grammar-Based Unification of Several Alignment and Folding Algorithms , 1996, ISMB.

[27]  Carl R. Woese,et al.  4 Probing RNA Structure, Function, and History by Comparative Analysis , 1993 .

[28]  M. Zuker Computer prediction of RNA structure. , 1989, Methods in enzymology.

[29]  Harold N. Gabow,et al.  An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs , 1976, JACM.

[30]  L. Gold,et al.  RNA pseudoknots that inhibit human immunodeficiency virus type 1 reverse transcriptase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[31]  P. Schuster,et al.  Statistics of RNA secondary structures , 1993, Biopolymers.

[32]  M. G. Say,et al.  Relativistic Quantum Fields , 1966 .

[33]  D. Turner,et al.  Predicting thermodynamic properties of RNA. , 1995, Methods in enzymology.

[34]  Gary D. Stormo,et al.  Automated Alignment of RNA Sequences to Pseudoknotted Structures , 1997, ISMB.

[35]  M. Zuker On finding all suboptimal foldings of an RNA molecule. , 1989, Science.

[36]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[38]  C. Pleij,et al.  A new principle of RNA folding based on pseudoknotting. , 1985, Nucleic acids research.

[39]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[40]  K. Richards,et al.  Sequence of 1000 nucleotides at the 3' end of tobacco mosaic virus RNA. , 1979, Nucleic acids research.

[41]  B. Ganem RNA world , 1987, Nature.

[42]  C. Pleij,et al.  An APL-programmed genetic algorithm for the prediction of RNA secondary structure. , 1995, Journal of theoretical biology.

[43]  Meir Shinitzky,et al.  Structural and functional aspects , 1994 .

[44]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[45]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[46]  E. Dam,et al.  Structural and functional aspects of RNA pseudoknots. , 1992, Biochemistry.

[47]  C. Pleij,et al.  Structural similarities among valine-accepting tRNA-like structures in tymoviral RNAs and elongator tRNAs , 1987 .

[48]  M. Zuker,et al.  "Well-determined" regions in RNA secondary structure prediction: analysis of small subunit ribosomal RNA. , 1995, Nucleic acids research.

[49]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[50]  A Renner,et al.  RNA structures and folding: from conventional to new issues in structure predictions. , 1997, Current opinion in structural biology.