Comparing RNA Structures with Biologically Relevant Operations Cannot Be Done without Strong Combinatorial Restrictions

Arc-annotated sequences are useful for representing structural information of RNAs and have been extensively used for comparing RNA structures in both terms of sequence and structural similarities. Among the many paradigms referring to arc-annotated sequences and RNA structures comparison (see [2] for more details), the most important one is the general edit distance. The problem of computing an edit distance between two non-crossing arc-annotated sequences was introduced in [5]. The introduced model uses edit operations that involve either single letters or pairs of letters (never considered separately) and is solvable in polynomial-time [12]. To account for other possible RNA structural evolutionary events, new edit operations, allowing to consider either silmutaneously or separately letters of a pair were introduced in [9]; unfortunately at the cost of computational tractability. It has been proved that comparing two RNA secondary structures using a full set of biologically relevant edit operations is NP-complete. Nevertheless, in [8], the authors have used a strong combinatorial restriction in order to compare two RNA stem-loops with a full set of biologically relevant edit operations; which have allowed them to design a polynomial-time and space algorithm for comparing general secondary RNA structures. In this paper we will prove theoretically that comparing two RNA structures using a full set of biologically relevant edit operations cannot be done without strong combinatorial restrictions.

[1]  Bin Ma,et al.  The longest common subsequence problem for arc-annotated sequences , 2004, J. Discrete Algorithms.

[2]  Alain Denise,et al.  Alignments of RNA Structures , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Mike Paterson,et al.  Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007, Revised Selected Papers , 2007, ESCAPE.

[4]  Michael R. Fellows,et al.  Algorithms and complexity for annotated sequence analysis , 1999 .

[5]  Cédric Chauve,et al.  An Edit Distance Between RNA Stem-Loops , 2005, SPIRE.

[6]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[7]  Carlos Eduardo Ferreira,et al.  Advances in Bioinformatics and Computational Biology, 5th Brazilian Symposium on Bioinformatics, BSB 2010, Rio de Janeiro, Brazil, August 31-September 3, 2010. Proceedings , 2010, BSB.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[10]  Zhi-Zhong Chen,et al.  The longest common subsequence problem for sequences with nested arc annotations , 2002, J. Comput. Syst. Sci..

[11]  Rolf Niedermeier,et al.  Computing the similarity of two sequences with nested arc annotations , 2004, Theor. Comput. Sci..

[12]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[13]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[14]  Guillaume Fertin,et al.  Comparing RNA Structures: Towards an Intermediate Model Between the Editand the LapcsProblems , 2007, BSB.

[15]  Rolf Niedermeier,et al.  Pattern matching for arc-annotated sequences , 2006, TALG.

[16]  Guillaume Fertin,et al.  Extending the Hardness of RNA Secondary Structure Comparison , 2007, ESCAPE.