A Faster and More Space-Efficient Algorithm for Inferring Arc-Annotations of RNA Sequences through Alignment

The nested arc-annotation of a sequence is an important model used to represent structural information for RNA and protein sequences. Given two sequences S1 and S2 and a nested arc-annotation P1 for S1, this paper considers the problem of inferring the nested arc-annotation P2 for S2 such that (S1, P1) and (S2, P2) have the largest common substructure. The problem has a direct application in predicting the secondary structure of an RNA sequence given a closely related sequence with known secondary structure. The currently most efficient algorithm for this problem requires O(nm3) time and O(nm2) space where n is the length of the sequence with known arc-annotation and m is the length of the sequence whose arc-annotation is to be inferred. By using sparsification on a new recursive dynamic programming algorithm and applying a Hirschberg-like traceback technique with compression, we obtain an improved algorithm that runs in min{O(nm2 + n2m),O(nm2 log n), O(nm3)} time and min{O(m2 + mn), O(m2 log n + n)} space.

[1]  R. Gutell,et al.  A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs. , 1995, RNA.

[2]  Zhi-Zhong Chen,et al.  The Longest Common Subsequence Problem for Sequences with Nested Arc Annotations , 2001, ICALP.

[3]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[4]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[5]  Rolf Niedermeier,et al.  Towards Optimally Solving the LONGEST COMMON SUBSEQUENCE Problem for Sequences with Nested Arc Annotations in Linear Time , 2002, CPM.

[6]  Bin Ma,et al.  Edit distance between two RNA structures , 2001, RECOMB.

[7]  Patricia A. Evans Finding Common Subsequences with Arcs and Pseudoknots , 1999, CPM.

[8]  David Haussler,et al.  RNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars , 1994, ISMB.

[9]  R. Ravi,et al.  Computing Similarity between RNA Strings , 1996, CPM.

[10]  Rolf Niedermeier,et al.  Computing the similarity of two sequences with nested arc annotations , 2004, Theor. Comput. Sci..

[11]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[12]  Bin Ma,et al.  The longest common subsequence problem for arc-annotated sequences , 2004, J. Discrete Algorithms.

[13]  Michael R. Fellows,et al.  Algorithms and complexity for annotated sequence analysis , 1999 .

[14]  Zhi-Zhong Chen,et al.  The longest common subsequence problem for sequences with nested arc annotations , 2002, J. Comput. Syst. Sci..

[15]  David Haussler,et al.  Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[16]  Kaizhong Zhang Computing similarity between RNA secondary structures , 1998, Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174).

[17]  R. Gutell,et al.  Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. , 1994, Microbiological reviews.

[18]  M. Zuker Prediction of RNA secondary structure by energy minimization. , 1994, Methods in molecular biology.

[19]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[20]  Wing-Kai Hon,et al.  On All-Substrings Alignment Problems , 2003, COCOON.

[21]  Christian N. S. Pedersen,et al.  Internal loops in RNA secondary structure prediction , 1999, RECOMB.

[22]  Rolf Niedermeier,et al.  Pattern matching for arc-annotated sequences , 2006, TALG.

[23]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Leslie Grate,et al.  Automatic RNA Secondary Structure Determination with Stochastic Context-Free Grammars , 1995, ISMB.

[25]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.