论文信息 - Tree edit distance with gaps

Tree edit distance with gaps

The purpose of this paper is to study the definition of edit distances with convex gap weights for trees. In the special case of strings, this problem has yield to the definition of classical solutions: Galil and Giancarlo produced in [2] an algorithm in O(n log(n)), for example. For trees, standart edit distance algorithms – [7] or more recently [4] with a O(n log(n))) solution – are concerned with linear gap weights induced by pointwise edit operations: inserting or removing one single node (or one single edge) at each step. These algorithms may be adapted to deal with affine gap weights, with open gap penalties and extension gap penalties. However, as far as we know, there is no tentative to extend thoses results to tree edit distances with arbitrary gap weights. The major motivation for this work comes from computational biology, with comparison of RNA molecules. RNA secondary structures without tertiary interactions, such as pseudoknots or base triples, may be canonically encoded by trees. See [6] for details. So comparing RNA structures amounts to computing edit distances between trees. It is a well-admitted fact that the insertion, or deletion, of a set of contiguous nucleotides can be assumed to result from a single mutationnal event. So it makes no sense to assign linear weight functions, as existing methods use to do. Convex gap weight functions are much more sensitive in this context. In the paper, we first prove that there exists no polynomial algorithm for the problem with convex gap weights, unless P = NP. In the second part, we consider one restriction of the definition of gaps to complete subtrees, and we get a quadratic algorithm for the associated tree edit distance.

Hélène Touzet | H. Touzet

[1] M. Maes,et al. On a Cyclic String-To-String Correction Problem , 1990, Inf. Process. Lett..

[2] Kaizhong Zhang,et al. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[3] Philip N. Klein,et al. Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[4] Raffaele Giancarlo,et al. Speeding up Dynamic Programming with Applications to Molecular Biology , 1989, Theor. Comput. Sci..

[5] Kaizhong Zhang,et al. On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[6] Sudarshan S. Chawathe,et al. Comparing Hierarchical Data in External Memory , 1999, VLDB.

[7] Kaizhong Zhang,et al. Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..