Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy

We adapt here a surprising technique, the boustrophedon method, to speed up the sampling of RNA secondary structures from the Boltzmann low-energy ensemble. This technique is simple and its implementation straight-forward, as it only requires a permutation in the order of some operations already performed in the stochastic traceback stage of these algorithms. It nevertheless greatly improves their worst-case complexity from $${\mathcal{O}}({n^2})$$ to $${\mathcal{O}}({n\log(n)})$$ , for n the size of the original sequence. Moreover the average-case complexity of the generation is shown to be improved from $${\mathcal{O}}({n\sqrt{n}})$$ to $${\mathcal{O}}({n\log(n)})$$ in an Boltzmann-weighted homopolymer model based on the Nussinov–Jacobson free-energy model. These results are extended to the more realistic Turner free-energy model through experiments performed on both structured (Drosophilia melanogaster mRNA 5S) and hybrid (Staphylococcus aureus RNAIII) RNA sequences, using a boustrophedon modified version of the popular software UnaFold. This improvement allows for the sampling of greater and more significant sets of structures in a given time.

[1]  D. Crothers,et al.  Improved estimation of secondary structure in ribonucleic acids. , 1973, Nature: New biology.

[2]  A M Lesk A combinatorial study of the effects of admitting non-Watson-Crick base pairings and of base composition on the helix-forming potential of polynucleotides of random sequence. , 1974, Journal of theoretical biology.

[3]  M. Waterman Secondary Structure of Single-Stranded Nucleic Acidst , 1978 .

[4]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[6]  D. Knuth,et al.  Mathematics for the Analysis of Algorithms , 1999 .

[7]  G. Viennot,et al.  Enumeration of RNA Secondary Structures by Complexity , 1985 .

[8]  Philippe Flajolet,et al.  Singularity Analysis of Generating Functions , 1990, SIAM J. Discret. Math..

[9]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[10]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[11]  Bruno Salvy,et al.  GFUN: a Maple package for the manipulation of generating and holonomic functions in one variable , 1994, TOMS.

[12]  Philippe Flajolet,et al.  A Calculus for the Random Generation of Labelled Combinatorial Structures , 1994, Theor. Comput. Sci..

[13]  N. J. A. Sloane,et al.  A New Operation on Sequences: The Boustrophedon Transform , 1996, J. Comb. Theory, Ser. A.

[14]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[15]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[16]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[17]  E. Westhof,et al.  Geometric nomenclature and classification of RNA base pairs. , 2001, RNA.

[18]  Markus E. Nebel,et al.  Combinatorial Properties of RNA Secondary Structures , 2003, J. Comput. Biol..

[19]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[20]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[21]  Philippe Flajolet Singular combinatorics , 2003 .

[22]  Jeffrey E. Barrick,et al.  New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[24]  Markus E Nebel,et al.  Investigation of the Bernoulli model for RNA secondary structures , 2004, Bulletin of mathematical biology.

[25]  Ye Ding,et al.  Sfold web server for statistical folding and rational design of nucleic acids , 2004, Nucleic Acids Res..

[26]  Peter Clote,et al.  RNALOSS: a web server for RNA locally optimal secondary structures , 2005, Nucleic Acids Res..

[27]  C. Lawrence,et al.  RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. , 2005, RNA.

[28]  Ching Wai Tan,et al.  Secondary structure prediction , 2005 .

[29]  Peter Clote,et al.  Energy landscape of k-point mutants of an RNA molecule , 2005, Bioinform..

[30]  R. Breaker,et al.  Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes , 2005, Nature Biotechnology.

[31]  R. Giegerich,et al.  Complete probabilistic analysis of RNA shapes , 2006, BMC Biology.

[32]  Michael Zuker,et al.  DINAMelt web server for nucleic acid melting prediction , 2005, Nucleic Acids Res..

[33]  P. Clote,et al.  Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. , 2005, RNA.

[34]  Peter Clote,et al.  An Efficient Algorithm to Compute the Landscape of Locally Optimal RNA Secondary Structures with Respect to the NussinovJacobson Energy Model , 2005, J. Comput. Biol..

[35]  Michael Zuker,et al.  Algorithms and software for nucleic acid sequences , 2006 .

[36]  Russell L. Malmberg,et al.  Rapid ab initio RNA Folding Including Pseudoknots Via Graph Tree Decomposition , 2006, WABI.

[37]  Yann Ponty,et al.  GenRGenS: software for generating random genomic sequences and structures , 2006, Bioinform..

[38]  E. Westhof,et al.  Topology of three-way junctions in folded RNAs. , 2006, RNA.

[39]  Ye Ding Statistical and Bayesian approaches to RNA secondary structure prediction. , 2006, RNA.

[40]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[41]  T. Schlick,et al.  A computational proposal for designing structured RNA pools for in vitro selection of RNAs. , 2007, RNA.

[42]  Peter Clote,et al.  RNAbor: a web server for RNA structural neighbors , 2007, Nucleic Acids Res..

[43]  Peter Clote,et al.  Asymptotics of RNA Shapes , 2008, J. Comput. Biol..