Towards Optimally Solving the LONGEST COMMON SUBSEQUENCE Problem for Sequences with Nested Arc Annotations in Linear Time

We present exact algorithms for the NP-complete Longest Common Subsequence problem for sequences with nested arc annotations, a problem occurring in structure comparison of RNA. Given two sequences of length at most n and nested arc structure, our algorithm determines (if existent) in time O(3.31k1+k2?n) an arc-preserving subsequence of both sequences, which can be obtained by deleting (together with corresponding arcs) k1 letters from the first and k2 letters from the second sequence. Thus, the problem is fixed-parameter tractable when parameterized by the number of deletions. This complements known approximation results which give a quadratic time factor-2-approximation for the general and polynomial time approximation schemes for restricted versions of the problem. In addition, we obtain further fixed-parameter tractability results for these restricted versions.

[1]  Oliver Kullmann,et al.  New Methods for 3-SAT Decision and Worst-case Analysis , 1999, Theor. Comput. Sci..

[2]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[3]  Rolf Niedermeier,et al.  Faster exact algorithms for hard problems: A parameterized point of view , 2001, Discret. Math..

[4]  Todd Wareham,et al.  Exact Algorithms for Computing Pairwise Alignments and 3-Medians From Structure-Annotated Sequences (Extended Abstract) , 2001, Pacific Symposium on Biocomputing.

[5]  Patricia A. Evans Finding Common Subsequences with Arcs and Pseudoknots , 1999, CPM.

[6]  Robert D. Carr,et al.  101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem , 2001, RECOMB.

[7]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[8]  Michael R. Fellows,et al.  Parameterized Complexity: The Main Ideas and Some Research Frontiers , 2009, ISAAC.

[9]  Bin Ma,et al.  Near optimal multiple alignment within a band in polynomial time , 2000, STOC '00.

[10]  Michael R. Fellows,et al.  The Parameterized Complexity of Sequence Alignment and Consensus , 1994, CPM.

[11]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[12]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[13]  Michael R. Fellows,et al.  Algorithms and complexity for annotated sequence analysis , 1999 .

[14]  Michael R. Fellows,et al.  Parameterized complexity analysis in computational biology , 1995, Comput. Appl. Biosci..

[15]  Mike Paterson,et al.  Longest Common Subsequences , 1994, MFCS.

[16]  Zhi-Zhong Chen,et al.  The Longest Common Subsequence Problem for Sequences with Nested Arc Annotations , 2001, ICALP.

[17]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[18]  Paola Bonizzoni,et al.  Experimenting an approximation algorithm for the LCS , 2001, Discret. Appl. Math..

[19]  Jiong Guo,et al.  Exact Algorithms for the Longest Common Subsequence Problem for Arc-Annotated Sequences , 2002 .