Fast RNA Structure Alignment for Crossing Input Structures

The complexity of pairwise RNA structure alignment depends on the structural restrictions assumed for both the input structures and the computed consensus structure. For arbitrarily crossing input and consensus structures, the problem is NP-hard. For non-crossing consensus structures, Jiang et al's algorithm [1] computes the alignment in O (n 2 m 2) time where n and m denote the lengths of the two input sequences. If also the input structures are non-crossing, the problem corresponds to tree editing which can be solved in $O(m^2n(1+\log\frac{n}{m}))$ time [2]. We present a new algorithm that solves the problem for d -crossing structures in O (d m 2 n logn ) time, where d is a parameter that is one for non-crossing structures, bounded by n for crossing structures, and much smaller than n on most practical examples. Crossing input structures allow for applications where the input is not a fixed structure but is given as base-pair probability matrices.

[1]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[2]  Gary D. Stormo,et al.  Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% , 2005, Bioinform..

[3]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[4]  Sonja J. Prohaska,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2007, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[5]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[6]  M. Waterman,et al.  RNA secondary structure: a complete mathematical analysis , 1978 .

[7]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[8]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[9]  Michal Ziv-Ukelson,et al.  A Study of Accessible Motifs and RNA Folding Complexity , 2007, J. Comput. Biol..

[10]  Ron Shamir,et al.  A Faster Algorithm for RNA Co-folding , 2008, WABI.

[11]  Patricia A. Evans Finding common RNA pseudoknot structures in polynomial time , 2006, J. Discrete Algorithms.

[12]  J. Gorodkin,et al.  Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments , 2008, Nucleic acids research.

[13]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[14]  Tatsuya Akutsu Approximation and Exact Algorithms for RNA Secondary Structure Prediction and Recognition of Stochastic Context-free Languages , 1999, J. Comb. Optim..

[15]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[16]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[17]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[18]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[20]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[21]  Michael R. Fellows,et al.  Algorithms and complexity for annotated sequence analysis , 1999 .