Tree edit distance with gaps

The purpose of this paper is to study the definition of edit distances with convex gap weights for trees. In the special case of strings, this problem has yield to the definition of classical solutions: Galil and Giancarlo produced in [2] an algorithm in O(n log(n)), for example. For trees, standart edit distance algorithms – [7] or more recently [4] with a O(n log(n))) solution – are concerned with linear gap weights induced by pointwise edit operations: inserting or removing one single node (or one single edge) at each step. These algorithms may be adapted to deal with affine gap weights, with open gap penalties and extension gap penalties. However, as far as we know, there is no tentative to extend thoses results to tree edit distances with arbitrary gap weights. The major motivation for this work comes from computational biology, with comparison of RNA molecules. RNA secondary structures without tertiary interactions, such as pseudoknots or base triples, may be canonically encoded by trees. See [6] for details. So comparing RNA structures amounts to computing edit distances between trees. It is a well-admitted fact that the insertion, or deletion, of a set of contiguous nucleotides can be assumed to result from a single mutationnal event. So it makes no sense to assign linear weight functions, as existing methods use to do. Convex gap weight functions are much more sensitive in this context. In the paper, we first prove that there exists no polynomial algorithm for the problem with convex gap weights, unless P = NP. In the second part, we consider one restriction of the definition of gaps to complete subtrees, and we get a quadratic algorithm for the associated tree edit distance.