An Efficient Algorithm to Compute the Landscape of Locally Optimal RNA Secondary Structures with Respect to the NussinovJacobson Energy Model

We make a novel contribution to the theory of biopolymer folding, by developing an efficient algorithm to compute the number of locally optimal secondary structures of an RNA molecule, with respect to the Nussinov-Jacobson energy model. Additionally, we apply our algorithm to analyze the folding landscape of selenocysteine insertion sequence (SECIS) elements from A. Bock (personal communication), hammerhead ribozymes from Rfam (Griffiths-Jones et al., 2003), and tRNAs from Sprinzl's database (Sprinzl et al., 1998). It had previously been reported that tRNA has lower minimum free energy than random RNA of the same compositional frequency (Clote et al., 2003; Rivas and Eddy, 2000), although the situation is less clear for mRNA (Seffens and Digby, 1999; Workman and Krogh, 1999; Cohen and Skienna, 2002),(1) which plays no structural role. Applications of our algorithm extend knowledge of the energy landscape differences between naturally occurring and random RNA. Given an RNA molecule a(1), ... , a(n) and an integer k > or = 0, a k-locally optimal secondary structure S is a secondary structure on a(1), ... , a(n) which has k fewer base pairs than the maximum possible number, yet for which no basepairs can be added without violation of the definition of secondary structure (e.g., introducing a pseudoknot). Despite the fact that the number numStr(k) of k-locally optimal structures for a given RNA molecule in general is exponential in n, we present an algorithm running in time O(n (4)) and space O(n (3)), which computes numStr(k) for each k. Structurally important RNA, such as SECIS elements, hammerhead ribozymes, and tRNA, all have a markedly smaller number of k-locally optimal structures than that of random RNA of the same dinucleotide frequency, for small and moderate values of k. This suggests a potential future role of our algorithm as a tool to detect noncoding RNA genes.

[1]  Peter Clote Protein Folding, the Levinthal Paradox and Rapidly Mixing Markov Chains , 1999, ICALP.

[2]  Thomas Tuschl,et al.  Sequence, chemical, and structural variation of small interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing. , 2003, Antisense & nucleic acid drug development.

[3]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[4]  Peter F. Stadler,et al.  Dynamic Programming Algorithm for the Density of States of RNA Secondary Structures , 1996, German Conference on Bioinformatics.

[5]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[6]  C. Levinthal Are there pathways for protein folding , 1968 .

[7]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[8]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[9]  P. Wolynes,et al.  Spin glasses and the statistical mechanics of protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Robert Giegerich,et al.  Reducing the Conformation Space in RNA Structure Prediction , 2001, German Conference on Bioinformatics.

[11]  M. Karplus,et al.  Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. , 1994, Journal of molecular biology.

[12]  Mark J. Gibbs,et al.  Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences , 2000, Bioinform..

[13]  Thomas Tuschl,et al.  Functional genomics: RNA sets the standard , 2003, Nature.

[14]  P. Schuster,et al.  RNA folding at elementary step resolution. , 1999, RNA.

[15]  Michael T. Wolfinger,et al.  Barrier Trees of Degenerate Landscapes , 2002 .

[16]  M. Karplus,et al.  How does a protein fold? , 1994, Nature.

[17]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[18]  C. Lawrence,et al.  Statistical prediction of single-stranded regions in RNA secondary structure and application to predicting effective antisense target sites and beyond. , 2001, Nucleic acids research.

[19]  A. Krogh,et al.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. , 1999, Nucleic acids research.

[20]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[22]  E. Shakhnovich Theoretical studies of protein-folding thermodynamics and kinetics. , 1997, Current opinion in structural biology.

[23]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[24]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[25]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[26]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[27]  David W. Digby,et al.  mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. , 1999, Nucleic acids research.

[28]  Steven Skiena,et al.  Natural Selection and Algorithmic Design of mRNA , 2003, J. Comput. Biol..

[29]  Mathias Sprinzl,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 1993, Nucleic Acids Res..