Constant Factor Approximation of Edit Distance of Bounded Height Unordered Trees

The edit distance problem on two unordered trees is known to be MAX SNP-hard. In this paper, we present an approximation algorithm whose approximation ratio is 2h + 2, where we consider unit cost edit operations and h is the maximum height of the two input trees. The algorithm is based on an embedding of unit cost tree edit distance into L 1 distance. We also present an efficient implementation of the algorithm using randomized dimension reduction.

[1]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[2]  Keisuke Tanaka,et al.  Approximation and Special Cases of Common Subtrees and Editing Distance , 1996, ISAAC.

[3]  Tatsuya Akutsu A relation between edit distance for ordered trees and edit distance for Euler strings , 2006, Inf. Process. Lett..

[4]  Kouichi Hirata,et al.  A Tree Distance Function Based on Multi-sets , 2009, PAKDD Workshops.

[5]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming) , 2005 .

[6]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[7]  Akutsu Tatsuya,et al.  Improved approximation of the Largest common Sub-tree of Two Unordered Trees of Bounded Height , 2008 .

[8]  Tao Jiang,et al.  Some MAX SNP-Hard Results Concerning Unordered Labeled Trees , 1994, Inf. Process. Lett..

[9]  Amit Kumar,et al.  XML stream processing using tree-edit distance embeddings , 2005, TODS.

[10]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[11]  Gabriel Valiente,et al.  An efficient bottom-up distance between trees , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[12]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[13]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[14]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[15]  Atsuhiro Takasu,et al.  Improved approximation of the largest common subtree of two unordered trees of bounded height , 2008, Inf. Process. Lett..

[16]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[17]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[18]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[19]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[20]  Kaizhong Zhang,et al.  A constrained edit distance between unordered labeled trees , 1996, Algorithmica.

[21]  Anthony K. H. Tung,et al.  Similarity evaluation on tree-structured data , 2005, SIGMOD '05.

[22]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[23]  Sudipto Guha,et al.  Approximate XML joins , 2002, SIGMOD '02.

[24]  G. Italiano,et al.  Algorit[h]ms - ESA '98 : 6th Annual European Symposium, Venice, Italy, August 24-26, 1998 : proceedings , 1998 .

[25]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.