Stable stem enabled shannon entropies distinguish non-coding RNAs from random backgrounds

The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure folding certainty, in the detection of structural non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection. This paper shows that the measuring performance of base pair entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur, and energetically stable stems are required, in a fold. This constraint actually reduces the space of the secondary structure and may lower probabilities of base pairs unfavorable to the native fold. Indeed, base pair entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs as well as drastic increases in the Z-score for all 13 tested ncRNA sets compared to shuffled sequences.

[1]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[2]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[3]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[4]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[5]  I. Tinoco,et al.  How RNA folds. , 1999, Journal of molecular biology.

[6]  Yves Van de Peer,et al.  Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences , 2004, Bioinform..

[7]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[8]  Niles A. Pierce,et al.  An algorithm for computing nucleic acid base‐pairing probabilities including pseudoknots , 2004, J. Comput. Chem..

[9]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[10]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[11]  D. Turner,et al.  Free energy increments for hydrogen bonds in nucleic acid base pairs , 1987 .

[12]  Edwin T. Jaynes Prior Probabilities , 2010, Encyclopedia of Machine Learning.

[13]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[14]  Rob Knight,et al.  Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories. , 2006, RNA.

[15]  E. Westhof,et al.  Topology of three-way junctions in folded RNAs. , 2006, RNA.

[16]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[17]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[18]  Erik Winfree,et al.  Thermodynamic Analysis of Interacting Nucleic Acid Strands , 2007, SIAM Rev..

[19]  Batey,et al.  Tertiary Motifs in RNA Structure and Folding. , 1999, Angewandte Chemie.

[20]  Vincent Moulton,et al.  A comparison of RNA folding measures , 2005, BMC Bioinformatics.

[21]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[22]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[23]  Sam Griffiths-Jones,et al.  Annotating noncoding RNA genes. , 2007, Annual review of genomics and human genetics.

[24]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[25]  D. Turner,et al.  RNA structure prediction. , 1988, Annual review of biophysics and biophysical chemistry.

[26]  Alan Mitchell Durham,et al.  Computational methods in noncoding RNA research , 2008, Journal of mathematical biology.

[27]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[28]  David H. Mathews,et al.  Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change , 2006, BMC Bioinformatics.

[29]  D Thirumalai,et al.  Native secondary structure formation in RNA may be a slave to tertiary folding. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[30]  P. Hraber,et al.  Estimating the Contributions of Selection and Self-Organization in RNA Secondary Structure , 1999, Journal of Molecular Evolution.

[31]  V. Moulton Tracking down noncoding RNAs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Eric Westhof,et al.  23 A Modular and Hierarchical Approach for All-Atom RNA Modeling , 2006 .

[33]  M. Huynen,et al.  Assessing the reliability of RNA folding using statistical mechanics. , 1997, Journal of molecular biology.

[34]  T. Schlick,et al.  Analysis of four-way junctions in RNA structures. , 2009, Journal of molecular biology.

[35]  David H Mathews,et al.  Predicting helical coaxial stacking in RNA multibranch loops. , 2007, RNA.

[36]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[37]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[38]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..