A Faster and More Space-Efficient Algorithm for Inferring Arc-Annotations of RNA Sequences Through Alignment

This paper considers the problem of inferring the optimal nested arc-annotation of a sequence given another nested arc-annotated sequence by maximizing the weighted alignment between the bases and arcs in the two sequences. The problem has a direct application in predicting the secondary structure of an RNA sequence given a closely related sequence whose secondary structure is already known. The currently most efficient algorithm for this problem requires O(nm 3) time and O(nm 2) space where n is the length of the sequence with known arc-annotation while m is the length of the sequence to be inferred. We present an improved algorithm which runs in min {O(nm 2 logn), O(nm 3)} time and min {O(m 2 + mn), O(m 2 logn)} space. The time improvement is achieved by applying sparsification to the dynamic programming algorithm, while the space is reduced to a more practical quadratic complexity by using a Hirschberg-like traceback technique together with a simple compression.

[1]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[2]  David Haussler,et al.  RNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars , 1994, ISMB.

[3]  Michael R. Fellows,et al.  Algorithms and complexity for annotated sequence analysis , 1999 .

[4]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[5]  R. Gutell,et al.  A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs. , 1995, RNA.

[6]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[7]  Rolf Niedermeier,et al.  Pattern Matching for Arc-Annotated Sequences , 2002, FSTTCS.

[8]  Leslie Grate,et al.  Automatic RNA Secondary Structure Determination with Stochastic Context-Free Grammars , 1995, ISMB.

[9]  Bin Ma,et al.  Edit distance between two RNA structures , 2001, RECOMB.

[10]  David Haussler,et al.  Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[11]  Zhi-Zhong Chen,et al.  The longest common subsequence problem for sequences with nested arc annotations , 2002, J. Comput. Syst. Sci..

[12]  Rolf Niedermeier,et al.  Towards Optimally Solving the LONGEST COMMON SUBSEQUENCE Problem for Sequences with Nested Arc Annotations in Linear Time , 2002, CPM.

[13]  Wing-Kai Hon,et al.  On All-Substrings Alignment Problems , 2003, COCOON.

[14]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[15]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[16]  R. Ravi,et al.  Computing Similarity between RNA Strings , 1996, CPM.

[17]  Christian N. S. Pedersen,et al.  Internal loops in RNA secondary structure prediction , 1999, RECOMB.

[18]  Kaizhong Zhang Computing similarity between RNA secondary structures , 1998, Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174).

[19]  R. Gutell,et al.  Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. , 1994, Microbiological reviews.

[20]  M. Zuker Prediction of RNA secondary structure by energy minimization. , 1994, Methods in molecular biology.

[21]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.