An Efficient Algorithm for Some Tree Matching Problems

Abstract In this paper we consider ordered h-ary trees, that is, trees whose nodes have exactly h sons; and ranked trees, where the number of sons depends on the node label. We define the subtree distance between two ordered h-trees T1, T2 as the number of subtrees to be inserted or deleted in T1 to obtain T2, and consider the problem of finding all the occurences, with bounded distance k, of an h-ary tree P as a subtree of another h-ary tree T. This problem is solved in time O(h|P| + max (h, k)|T|) . We then study the classical problem of finding all the occurences of a ranked tree P in another tree T, where the two trees are labelled, and a special label v in the leaves of P stands for any subtree in T. An extension of the previous algorithm allows to solve this problem in time O(|P| + k|T|). where k is the number of labels v in P. We also discuss some natural variants of the two problems.