Asymptotics of RNA Shapes

RNA shapes, introduced by Giegerich et al. (2004), provide a useful classification of the branching complexity for RNA secondary structures. In this paper, we derive an exact value for the asymptotic number of RNA shapes, by relying on an elegant relation between non-ambiguous, context-free grammars, and generating functions. Our results provide a theoretical upper bound on the length of RNA sequences amenable to probabilistic shape analysis (Steffen et al., 2006; Voss et al., 2006), under the assumption that any base can basepair with any other base. Since the relation between context-free grammars and asymptotic enumeration is simple, yet not well-known in bioinformatics, we give a self-contained presentation with illustrative examples. Additionally, we prove a surprising 1-to-1 correspondence between pi-shapes and Motzkin numbers.

[1]  Quentin Vicens,et al.  Atomic level architecture of group I introns revealed. , 2006, Trends in biochemical sciences.

[2]  G. Viennot,et al.  Enumeration of RNA Secondary Structures by Complexity , 1985 .

[3]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[4]  S Commans,et al.  Selenocysteine inserting tRNAs: an overview. , 1999, FEMS microbiology reviews.

[5]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[6]  Michel Termier,et al.  Towards a computational model for −1 eukaryotic frameshifting sites , 2003, Bioinform..

[7]  M. Bousquet-Mélou,et al.  Convex polyominoes and algebraic languages , 1992 .

[8]  John W. Moon,et al.  On an asymptotic method in enumeration , 1989, J. Comb. Theory, Ser. A.

[9]  W. Burnside Theory of Functions of a Complex Variable , 1893, Nature.

[10]  Jennifer A. Doudna,et al.  The chemical repertoire of natural ribozymes , 2002, Nature.

[11]  R. A. Silverman,et al.  Theory of Functions of a Complex Variable , 1968 .

[12]  Philippe Flajolet,et al.  Singularity Analysis of Generating Functions , 1990, SIAM J. Discret. Math..

[13]  Enrico Di Cera Thermodynamics in biology , 2000 .

[14]  Rolf Backofen,et al.  COMPUTATIONAL MOLECULAR BIOLOGY: AN INTRODUCTION , 2000 .

[15]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[16]  E. Bender Asymptotic Methods in Enumeration , 1974 .

[17]  Michael Zuker,et al.  RNA Secondary Structure Prediction , 2007, Current protocols in nucleic acid chemistry.

[18]  J. W. Brown,et al.  Complex Variables and Applications , 1985 .

[19]  R. Giegerich,et al.  Complete probabilistic analysis of RNA shapes , 2006, BMC Biology.

[20]  S. Beaucage,et al.  Current Protocols in Nucleic Acid Chemistry , 1999 .

[21]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Michael S. Waterman,et al.  Introduction to Computational Biology: Maps, Sequences and Genomes , 1998 .

[23]  R. Breaker,et al.  An mRNA structure that controls gene expression by binding FMN , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  M. Waterman Secondary Structure of Single-Stranded Nucleic Acidst , 1978 .

[25]  David B. Jaffe,et al.  The biology department , 2006 .

[26]  Philippe Flajolet Singular combinatorics , 2003 .

[27]  Christos H. Papadimitriou,et al.  Elements of the Theory of Computation , 1997, SIGA.

[28]  Michael J. E. Sternberg,et al.  Secondary structure prediction: Current Opinion in Structural Biology 1992, 2:237–241 , 1992 .

[29]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[30]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[31]  A. Odlyzko Asymptotic enumeration methods , 1996 .

[32]  D. Crothers,et al.  Improved estimation of secondary structure in ribonucleic acids. , 1973, Nature: New biology.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  Carolyn J. Brown,et al.  The human XIST gene: Analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus , 1992, Cell.

[35]  R. Breaker,et al.  Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes , 2005, Nature Biotechnology.

[36]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[37]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[38]  A. Denise,et al.  Random generation of words of context-free languages according to the frequencies of letters , 2000 .

[39]  Bruno Salvy,et al.  GFUN: a Maple package for the manipulation of generating and holonomic functions in one variable , 1994, TOMS.

[40]  Michael S. Waterman,et al.  On some new sequences generalizing the Catalan and Motzkin numbers , 1979, Discret. Math..

[41]  J. S. Weinger,et al.  Substrate-assisted catalysis of peptide bond formation by the ribosome , 2004, Nature Structural &Molecular Biology.

[42]  Sanghoon Moon,et al.  Predicting genes expressed via −1 and +1 frameshifts , 2004, Nucleic acids research.

[43]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[44]  Markus E. Nebel,et al.  Combinatorial Properties of RNA Secondary Structures , 2003, J. Comput. Biol..

[45]  Peter Clote,et al.  Combinatorics of Saturated Secondary Structures of RNA , 2006, J. Comput. Biol..

[46]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[47]  E. Rodney Canfield Remarks on an Asymptotic Method in Combinatorics , 1984, J. Comb. Theory, Ser. A.

[48]  Einar Andreas Rødland Pseudoknots in RNA Secondary Structures: Representation, Enumeration, and Prevalence , 2006, J. Comput. Biol..

[49]  Peter Walter,et al.  Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum , 1982, Nature.

[50]  D. Turner,et al.  Thermal unfolding of a group I ribozyme: the low-temperature transition is primarily disruption of tertiary structure. , 1993, Biochemistry.

[51]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[52]  Peter F. Stadler,et al.  Combinatorics of RNA Secondary Structures , 1998, Discret. Appl. Math..