RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry

BackgroundScanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described.ResultsHere, we describe and implement an O(NLψ(L)) engine for the consecutive windows folding problem, where ψ(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome.ConclusionThe efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

[1]  L. Peliti,et al.  Why is the DNA denaturation transition first order? , 2000, Physical review letters.

[2]  Thomas Tuschl,et al.  Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. , 2008, Methods.

[3]  Aleksey Y. Ogurtsov,et al.  A periodic pattern of mRNA secondary structure created by the genetic code , 2006, Nucleic acids research.

[4]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[5]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[6]  Sean R Eddy,et al.  How do RNA folding algorithms work? , 2004, Nature Biotechnology.

[7]  S. Sunkin,et al.  Specific expression of long noncoding RNAs in the mouse brain , 2008, Proceedings of the National Academy of Sciences.

[8]  Michael E. Fisher,et al.  Shape of a Self‐Avoiding Walk or Polymer Chain , 1966 .

[9]  D. Crothers,et al.  Improved estimation of secondary structure in ribonucleic acids. , 1973, Nature: New biology.

[10]  A. Krogh,et al.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. , 1999, Nucleic acids research.

[11]  Vincent Moulton,et al.  A comparison of RNA folding measures , 2005, BMC Bioinformatics.

[12]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[14]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[15]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[16]  RNA denaturation: excluded volume, pseudoknots, and transition scenarios. , 2003, Physical review letters.

[17]  Peter F. Stadler,et al.  Prediction of locally stable RNA secondary structures for genome-wide surveys , 2004, Bioinform..

[18]  Peter Winkler,et al.  Shuffling Biological Sequences , 1996, Discret. Appl. Math..

[19]  A L Stella,et al.  Scale-free network hidden in a collapsing polymer. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  J. Mattick,et al.  Noncoding RNAs in Long-Term Memory Formation , 2008, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[21]  Michal Ziv-Ukelson,et al.  A Study of Accessible Motifs and RNA Folding Complexity , 2007, J. Comput. Biol..

[22]  M. Zuker Computer prediction of RNA structure. , 1989, Methods in enzymology.

[23]  David W. Digby,et al.  mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. , 1999, Nucleic acids research.

[24]  Donald R Forsdyke,et al.  Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues. , 2007, Journal of theoretical biology.