Pattern matching for arc-annotated sequences

We study pattern matching for arc-annotated sequences. An O(nm) time algorithm is given for the problem to determine whether a length m sequence with nested arc annotation is an arc-preserving subsequence (aps) of a length n sequence with nested arc annotation, called APS(NESTED,NESTED). Arc-annotated sequences and, in particular, those with nested arc annotation are motivated by applications in RNA structure comparison. Our algorithm generalizes results for ordered tree inclusion problems and it is useful for recent fixed-parameter algorithms for LAPCS(NESTED,NESTED), which is the problem of computing a longest arc-preserving common subsequence of two sequences with nested arc annotations. In particular, the presented dynamic programming methodology implies a quadratic-time algorithm for an open problem posed by Vialette.

[1]  Rodney G. Downey,et al.  Parameterized complexity for the skeptic , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[2]  Stéphane Vialette,et al.  On the computational complexity of 2-interval pattern matching problems , 2004, Theor. Comput. Sci..

[3]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[4]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[5]  Todd Wareham,et al.  Exact Algorithms for Computing Pairwise Alignments and 3-Medians From Structure-Annotated Sequences (Extended Abstract) , 2001, Pacific Symposium on Biocomputing.

[6]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[7]  Bin Ma,et al.  The longest common subsequence problem for arc-annotated sequences , 2004, J. Discrete Algorithms.

[8]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  Patricia A. Evans Finding Common Subsequences with Arcs and Pseudoknots , 1999, CPM.

[10]  Michael R. Fellows,et al.  Algorithms and complexity for annotated sequence analysis , 1999 .

[11]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[12]  Mathieu Raffinot,et al.  Approximate matching of secondary structures , 2002, RECOMB '02.

[13]  Dimitrios M. Thilikos,et al.  Invitation to fixed-parameter algorithms , 2007, Comput. Sci. Rev..

[14]  R. Ravi,et al.  Computing Similarity between RNA Strings , 1996, CPM.

[15]  Zhi-Zhong Chen,et al.  The longest common subsequence problem for sequences with nested arc annotations , 2002, J. Comput. Syst. Sci..

[16]  Michael R. Fellows,et al.  New Directions and New Challenges in Algorithm Design and Complexity, Parameterized , 2003, WADS.

[17]  Wing-Kai Hon,et al.  On All-Substrings Alignment Problems , 2003, COCOON.

[18]  Rolf Niedermeier,et al.  Computing the similarity of two sequences with nested arc annotations , 2004, Theor. Comput. Sci..

[19]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[20]  GusfieldDan Introduction to the IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2004 .

[21]  Rolf Niedermeier,et al.  Towards Optimally Solving the LONGEST COMMON SUBSEQUENCE Problem for Sequences with Nested Arc Annotations in Linear Time , 2002, CPM.

[22]  Stéphane Vialette Pattern Matching Problems over 2-Interval Sets , 2002, CPM.

[23]  Rolf Niedermeier,et al.  Pattern Matching for Arc-Annotated Sequences , 2002, FSTTCS.