Kisses, ambivalent models and more: Contributions to the analysis of RNA secondary structure

The full functional role of RNA in all domains of life is yet to be explored. Deep sequencing technologies generate massive data about RNA transcripts with functional potential. To decipher this information, bioinformatics methods for structural analysis are in demand. With this thesis at hand, we want to improve current secondary structure prediction in different respects. The introductory chapter explains ADP with a focus on its comfortable, but atypical style of specifying algorithms. Then, we present five contributions to the analysis of RNA secondary structures. 1. It is the nature of models to abstract and simplify reality in order to master its complexity. Chapter 3 is an in depth analysis of four popular computational models of RNA secondary structure (Programs RNAshapes and RNAalishapes). 2. The secondary structure of RNA is too dynamic to be described by a single structure and in turn, there is no single optimal secondary structure. Thus, we compute the most likely abstract shape of a given RNA sequence. Improvements of the algorithms for computing the likelihood of abstract shapes are discussed in Chapter 4, specifically with regards to computational speed (Program RapidShapes). 3. For computational complexity reasons, models of RNA structures commonly exclude crossing base-pairs, the so-called "pseudoknots", from the secondary structure. In Chapter 5, we introduce a heuristic for mastering a frequent type of pseudoknots: "kissing-hairpins" (Program pKiss). 4. In Chapter 6 we revisit the old algorithmic idea of outside-in computation for the new programming framework Bellman’s GAP. This broadens the arsenal of rapid prototyping algorithms for RNA and other sequential problems. It adds "outside" and "MEA" functionality to RNAshapes and RNAalishapes. 5. Covariance Models representing RNA families assume a single consensus secondary structure for a set of related RNAs and serve as statistical tools to search for additional members. In Chapter 7, we evaluate CM scorings that are more structurespecific than the standard sequence-to-model alignments. Furthermore, we introduce a technique to incorporate "ambivalent" consensus structures into covariance models (Program aCMs). The results of this work are available at the Bielefeld Bioinformatic Server. The RNA Studio (http://bibiserv.cebitec.uni-bielefeld.de/rna) supports ready to use web-submissions, web-services and cloud computing for the programs developed in this thesis. debian packages foster a simple way to install our software on your local machine. Developers can benefit from our algorithmic analyses or use our sources for rapid prototyping as a primer for new implementations: http://bibiserv.cebitec.uni-bielefeld.de/fold-grammars.

[1]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[2]  A. Condon,et al.  Improved free energy parameters for RNA pseudoknotted secondary structure prediction. , 2010, RNA.

[3]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[4]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[5]  Robert Giegerich,et al.  Pure multiple RNA secondary structure alignments: a progressive profile approach , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[7]  Robert Giegerich,et al.  Versatile and declarative dynamic programming using pair algebras , 2005, BMC Bioinformatics.

[8]  Robert Giegerich,et al.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics , 2004, BMC Bioinformatics.

[9]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[10]  Robert Giegerich,et al.  Introduction to stochastic context free grammars. , 2014, Methods in molecular biology.

[11]  Naoki Sugimoto,et al.  Long RNA dangling end has large energetic contribution to duplex stability. , 2002, Journal of the American Chemical Society.

[12]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[13]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[14]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[15]  Jian Lu,et al.  The birth and death of microRNA genes in Drosophila , 2008, Nature Genetics.

[16]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[17]  Ignacio Tinoco,et al.  Unusual mechanical stability of a minimal RNA kissing complex , 2006, Proceedings of the National Academy of Sciences.

[18]  K. Murphy,et al.  Computational approaches for RNA energy parameter estimation. , 2010, RNA.

[19]  Minghui Jiang,et al.  uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts , 2008, BMC Bioinformatics.

[20]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P. Stadler,et al.  Comparative analysis of eukaryotic U3 snoRNA , 2009, RNA biology.

[22]  P. Clote,et al.  Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. , 2005, RNA.

[23]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[24]  Robert Giegerich,et al.  Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis , 2013, Bioinform..

[25]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Peter F. Stadler,et al.  U7 snRNAs: A Computational Survey , 2008, Genom. Proteom. Bioinform..

[27]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[28]  A. Datta,et al.  Heuristic RNA pseudoknot prediction including intramolecular kissing hairpins. , 2011, RNA.

[29]  C. Pleij,et al.  Kissing of the two predominant hairpin loops in the coxsackie B virus 3' untranslated region is the essential structural feature of the origin of replication required for negative-strand RNA synthesis , 1997, Journal of virology.

[30]  R. Giegerich,et al.  Complete probabilistic analysis of RNA shapes , 2006, BMC Biology.

[31]  Robert Giegerich,et al.  Shape based indexing for faster search of RNA family databases , 2008, BMC Bioinformatics.

[32]  Georg Sauthoff,et al.  Bellman's GAP: a 2nd generation language and system for algebraic dynamic programming , 2010 .

[33]  P. Stadler,et al.  Evolution of 7SK RNA and its protein partners in metazoa. , 2009, Molecular biology and evolution.

[34]  Robert Giegerich,et al.  A discipline of dynamic programming over sequence data , 2004, Sci. Comput. Program..

[35]  Liang Zhao,et al.  The dynamic structural basis of differential enhancement of conformational stability by 5'- and 3'-dangling ends in RNA. , 2008, Biochemistry.

[36]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[37]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[38]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[39]  Ye Ding,et al.  Structure clustering features on the Sfold Web server , 2005, Bioinform..

[40]  Einar Andreas Rødland Pseudoknots in RNA Secondary Structures: Representation, Enumeration, and Prevalence , 2006, J. Comput. Biol..

[41]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[42]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[43]  F. H. D. van Batenburg,et al.  PseudoBase: structural information on RNA pseudoknots , 2001, Nucleic Acids Res..

[44]  H. Hoos,et al.  HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. , 2005, RNA.

[45]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[46]  Toralf Kirsten,et al.  Evolution of Spliceosomal snRNA Genes in Metazoan Animals , 2008, Journal of Molecular Evolution.

[47]  Kiyoshi Asai,et al.  Prediction of RNA secondary structure using generalized centroid estimators , 2009, Bioinform..

[48]  Hosna Jabbari,et al.  An O(n5) Algorithm for MFE Prediction of Kissing Hairpins and 4-Chains in Nucleic Acids , 2009, J. Comput. Biol..

[49]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[50]  Robert Giegerich,et al.  Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction , 2005, Bioinform..

[51]  I. Tinoco,et al.  Stability of ribonucleic acid double-stranded helices. , 1974, Journal of molecular biology.

[52]  Christian M. Reidys,et al.  Topology and prediction of RNA pseudoknots , 2011, Bioinform..

[53]  A. Fire,et al.  Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans , 1998, Nature.

[54]  Robert Giegerich,et al.  Effective ambiguity checking in biosequence analysis , 2005, BMC Bioinformatics.

[55]  Robert Giegerich,et al.  Challenges in the compilation of a domain specific language for dynamic programming , 2006, SAC '06.

[56]  Christian Höner zu Siederdissen,et al.  Sneaking around concatMap: efficient combinators for dynamic programming , 2012, ICFP.

[57]  Robert Giegerich,et al.  Table design in dynamic programming , 2006, Inf. Comput..

[58]  Robert Giegerich,et al.  Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction , 2011, BMC Bioinformatics.

[59]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[60]  Peter Clote,et al.  Asymptotics of RNA Shapes , 2008, J. Comput. Biol..

[61]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[62]  Eugene Berezikov,et al.  Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. , 2006, Genome research.

[63]  I. Tinoco,et al.  How RNA folds. , 1999, Journal of molecular biology.

[64]  Robert Giegerich,et al.  Semantics and Ambiguity of Stochastic RNA Family Models , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[65]  J. Herold,et al.  An 'elaborated' pseudoknot is required for high frequency frameshifting during translation of HCV 229E polymerase mRNA. , 1993, Nucleic acids research.

[66]  R. Barrangou,et al.  CRISPR/Cas, the Immune System of Bacteria and Archaea , 2010, Science.

[67]  Robert Giegerich,et al.  Bellman's GAP: a declarative language for dynamic programming , 2011, PPDP.

[68]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[69]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[70]  Robert Giegerich,et al.  Yield grammar analysis in the Bellman's GAP compiler , 2011, LDTA.

[71]  Daniel G. Brown,et al.  The most probable annotation problem in HMMs and its application to bioinformatics , 2007, J. Comput. Syst. Sci..

[72]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[73]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[74]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[75]  R. Bellman Dynamic programming. , 1957, Science.

[76]  K. Weeks,et al.  Selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution , 2006, Nature Protocols.

[77]  W. Gilbert Origin of life: The RNA world , 1986, Nature.

[78]  Markus E. Nebel,et al.  On quantitative effects of RNA shape abstraction , 2009, Theory in Biosciences.

[79]  Robert Giegerich,et al.  A Silent Exonic SNP in Kdm3a Affects Nucleic Acids Structure but Does Not Regulate Experimental Autoimmune Encephalomyelitis , 2013, PloS one.

[80]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[81]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[82]  Susan R. Wilson INTRODUCTION TO COMPUTATIONAL BIOLOGY: MAPS, SEQUENCES AND GENOMES. , 1996 .

[83]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[84]  D. Mathews,et al.  ProbKnot: fast prediction of RNA secondary structure including pseudoknots. , 2010, RNA.

[85]  Peter Steffen,et al.  Compiling a domain specific language for dynamic programming , 2006 .

[86]  Robert Giegerich,et al.  Prediction of RNA Secondary Structure Including Kissing Hairpin Motifs , 2010, WABI.

[87]  D. Turner,et al.  Thermodynamics of unpaired terminal nucleotides on short RNA helixes correlates with stacking at helix termini in larger RNAs. , 1999, Journal of molecular biology.

[88]  Robert Giegerich,et al.  A systematic approach to dynamic programming in bioinformatics , 2000, Bioinform..

[89]  Sean R. Eddy,et al.  Query-Dependent Banding (QDB) for Faster RNA Similarity Searches , 2007, PLoS Comput. Biol..

[90]  Robert Giegerich,et al.  Explaining and Controlling Ambiguity in Dynamic Programming , 2000, CPM.

[91]  Hosna Jabbari,et al.  Computational prediction of nucleic acid secondary structure: Methods, applications, and challenges , 2009, Theor. Comput. Sci..

[92]  Gabriele Varani,et al.  Strong correlation between SHAPE chemistry and the generalized NMR order parameter (S2) in RNA. , 2008, Journal of the American Chemical Society.

[93]  Robert Giegerich,et al.  Locomotif: from graphical motif description to RNA motif search , 2007, ISMB/ECCB.

[94]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[95]  Robert Giegerich,et al.  Faster computation of exact RNA shape probabilities , 2010, Bioinform..

[96]  Kevin P. Murphy,et al.  Efficient parameter estimation for RNA secondary structure prediction , 2007, ISMB/ECCB.

[97]  I. Tinoco,et al.  Characterization of a "kissing" hairpin complex derived from the human immunodeficiency virus genome. , 1994, Proceedings of the National Academy of Sciences of the United States of America.