Local Exact Pattern Matching for Non-Fixed RNA Structures

Detecting local common sequence-structure regions of RNAs is a biologically important problem. Detecting such regions allows biologists to identify functionally relevant similarities between the inspected molecules. We developed dynamic programming algorithms for finding common structure-sequence patterns between two RNAs. The RNAs are given by their sequence and a set of potential base pairs with associated probabilities. In contrast to prior work on local pattern matching of RNAs, we support the breaking of arcs. This allows us to add flexibility over matching only fixed structures; potentially matching only a similar subset of specified base pairs. We present an O(n3) algorithm for local exact pattern matching between two nested RNAs, and an O(n3 log n) algorithm for one nested RNA and one bounded-unlimited RNA. In addition, an algorithm for approximate pattern matching is introduced that for two given nested RNAs and a number k, finds the maximal local pattern matching score between the two RNAs with at most k mismatches in O(n3k2) time. Finally, we present an O(n3) algorithm for finding the most similar subforest between two nested RNAs.

[1]  M. Crochemore,et al.  Algorithmic Aspects of ARC‐Annotated Sequences , 2010 .

[2]  Gad M. Landau,et al.  Locality and Gaps in RNA Comparison , 2007, J. Comput. Biol..

[3]  Zhi-Zhong Chen,et al.  The longest common subsequence problem for sequences with nested arc annotations , 2002, J. Comput. Syst. Sci..

[4]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[5]  Rolf Backofen,et al.  A Dynamic Programming Approach for Finding Common Patterns in RNAs , 2007, J. Comput. Biol..

[6]  Jesper Jansson,et al.  Algorithms for Finding a Most Similar Subforest , 2006, Theory of Computing Systems.

[7]  Gad M. Landau,et al.  Exact Pattern Matching for RNA Structure Ensembles , 2012, RECOMB.

[8]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[9]  Rolf Backofen,et al.  Lifting Prediction to Alignment of RNA Pseudoknots , 2009, RECOMB.

[10]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[11]  Robert Giegerich,et al.  Forest Alignment with Affine Gaps and Anchors , 2011, CPM.

[12]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[13]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[14]  J. Couzin Small RNAs Make Big Splash , 2002, Science.

[15]  P. Moore,et al.  Structural motifs in RNA. , 1999, Annual review of biochemistry.

[16]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[17]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[18]  Bin Ma,et al.  The longest common subsequence problem for arc-annotated sequences , 2004, J. Discrete Algorithms.

[19]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[20]  Gad M. Landau,et al.  Local Exact Pattern Matching for Non-fixed RNA Structures , 2012, CPM.

[21]  Hélène Touzet,et al.  Decomposition algorithms for the tree edit distance problem , 2005, J. Discrete Algorithms.

[22]  Bin Ma,et al.  Computing similarity between RNA structures , 1999, Theor. Comput. Sci..

[23]  Michael R. Fellows,et al.  Algorithms and complexity for annotated sequence analysis , 1999 .

[24]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[25]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[26]  Gad M. Landau,et al.  Fast RNA Structure Alignment for Crossing Input Structures , 2009, CPM.