Fast Computation of the Tree Edit Distance between Unordered Trees Using IP Solvers

We propose a new method for computing the tree edit distance between two unordered trees by problem encoding. Our method transforms an instance of the computation into an instance of some IP problems and solves it by an efficient IP solver. The tree edit distance is defined as the minimum cost of a sequence of edit operations (either substitution, deletion, or insertion) to transform a tree into another one. Although its time complexity is NP-hard, some encoding techniques have been proposed for computational efficiency. An example is an encoding method using the clique problem. As a new encoding method, we propose to use IP solvers and provide new IP formulations representing the problem of finding the minimum cost mapping between two unordered trees, where the minimum cost exactly coincides with the tree edit distance. There are IP solvers other than that for the clique problem and our method can efficiently compute ariations of the tree edit distance by adding additional constraints. Our experimental results with Glycan datasets and the Web log datasets CSLOGS show that our method is much faster than an existing method if input trees have a large degree. We also show that two variations of the tree edit distance could be computed efficiently by IP solvers.

[1]  Kouichi Hirata,et al.  An A* Algorithm for Computing Edit Distance between Rooted Labeled Unordered Trees , 2011, JSAI-isAI Workshops.

[2]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Kouichi Hirata,et al.  Segmental Mapping and Distance for Rooted Labeled Ordered Trees , 2012, ISAAC.

[4]  Kaizhong Zhang,et al.  Exact and approximate algorithms for unordered tree matching , 1994, IEEE Trans. Syst. Man Cybern..

[5]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[6]  T. Kuboyama Matching and Learning in Trees , 2007 .

[7]  Tobias Achterberg,et al.  SCIP: solving constraint integer programs , 2009, Math. Program. Comput..

[8]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[9]  Takao Terano,et al.  New Frontiers in Artificial Intelligence , 2008, Lecture Notes in Computer Science.

[10]  S. Nash,et al.  Linear and Nonlinear Optimization , 2008 .

[11]  Atsuhiro Takasu,et al.  A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures , 2011, BMC Bioinformatics.

[12]  Robert E. Bixby,et al.  MIP: Theory and Practice - Closing the Gap , 1999, System Modelling and Optimization.

[13]  Gabriel Valiente,et al.  An efficient bottom-up distance between trees , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[14]  Yair Horesh,et al.  Designing an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees , 2006, J. Comput. Biol..

[15]  Kaizhong Zhang,et al.  Approximate Tree Matching in the Presence of Variable Length Don't Cares , 1994, J. Algorithms.

[16]  Robert E. Bixby,et al.  Mixed-Integer Programming: A Progress Report , 2004, The Sharpest Cut.

[17]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[18]  Atsuhiro Takasu,et al.  Efficient exponential-time algorithms for edit distance between unordered trees , 2014, J. Discrete Algorithms.

[19]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[20]  Akutsu Tatsuya,et al.  An Improved Clique-Based Method for Computing Edit Distance between Rooted Unordered Trees , 2011 .