Exact Pattern Matching for RNA Structure Ensembles

ExpaRNA's core algorithm computes, for two fixed RNA structures, a maximal non-overlapping set of maximal exact matchings. We introduce an algorithm ExpaRNA-P that solves the lifted problem of finding such sets of exact matchings in entire Boltzmann-distributed structure ensembles of two RNAs. Due to a novel kind of structural sparsification, the new algorithm maintains the time and space complexity of the algorithm for fixed input structures. Furthermore, we generalized the chaining algorithm of ExpaRNA in order to compute a compatible subset of ExpaRNA-P's exact matchings. We show that ExpaRNA-P outperforms ExpaRNA in BRAliBase 2.1 benchmarks, where we pass the chained exact matchings as anchor constraints to the RNA alignment tool LocARNA. Compared to LocARNA, this novel approach shows similar accuracy but is six times faster.

[1]  Michael Beckstette,et al.  Lightweight comparison of RNAs based on exact sequence–structure matches , 2009, German Conference on Bioinformatics.

[2]  Rolf Backofen,et al.  Time and Space Efficient RNA-RNA Interaction Prediction via Sparse Folding , 2010, RECOMB.

[3]  Manolis Kellis,et al.  New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. , 2011, Genome research.

[4]  Stefan Washietl,et al.  Identifying Structural Noncoding RNAs Using RNAz , 2007, Current protocols in bioinformatics.

[5]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[6]  Michal Ziv-Ukelson,et al.  A Study of Accessible Motifs and RNA Folding Complexity , 2006, RECOMB.

[7]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[8]  Thomas E. Royce,et al.  Global Identification of Human Transcribed Sequences with Genome Tiling Arrays , 2004, Science.

[9]  Gary D. Stormo,et al.  Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% , 2005, Bioinform..

[10]  D. Reidel,et al.  The Transcriptional Landscape of the Mammalian Genome The FANTOM Consortium* and RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group)* , 2005 .

[11]  Andreas Wilm,et al.  An enhanced RNA alignment benchmark for sequence alignment programs , 2006, Algorithms for Molecular Biology.

[12]  Knut Reinert,et al.  Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization , 2007, BMC Bioinformatics.

[13]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[14]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[15]  Ron Shamir,et al.  A Faster Algorithm for RNA Co-folding , 2008, WABI.

[16]  Ivo L Hofacker,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2006, Genome informatics. International Conference on Genome Informatics.

[17]  Rolf Backofen,et al.  Structural Local Multiple Alignment of RNA , 2008, German Conference on Bioinformatics.

[18]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[19]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[20]  Kristin Reiche,et al.  Structural profiles of human miRNA families from pairwise clustering , 2009, Bioinform..

[21]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[22]  J. Mattick,et al.  A global view of genomic information--moving beyond the gene and the master regulator. , 2010, Trends in genetics : TIG.

[23]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[24]  Jan Gorodkin,et al.  Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix , 2007, PLoS Comput. Biol..

[25]  Rolf Backofen,et al.  Local Sequence-structure Motifs in Rna , 2004, J. Bioinform. Comput. Biol..

[26]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[27]  Sonja J. Prohaska,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2007, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[28]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[29]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[30]  Lior Pachter,et al.  Specific alignment of structured RNA: stochastic grammars and sequence annealing , 2008, Bioinform..

[31]  Rolf Backofen,et al.  Sparse RNA Folding: Time and Space Efficient Algorithms , 2009, CPM.

[32]  T. Gingeras,et al.  Genome-wide transcription and the implications for genomic organization , 2007, Nature Reviews Genetics.

[33]  G. Helt,et al.  Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution , 2005, Science.

[34]  Chuan-Sheng Foo,et al.  A max-margin model for efficient simultaneous alignment and folding of RNA sequences , 2008, ISMB.

[35]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[36]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[37]  Rolf Backofen,et al.  Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons , 2005 .

[38]  Rolf Backofen,et al.  Fast detection of common sequence structure patterns in RNAs , 2004, J. Discrete Algorithms.