论文信息 - Finding similar consensus between trees: an algorithm and a distance hierarchy

Finding similar consensus between trees: an algorithm and a distance hierarchy

Abstract The problem of finding this similar consensus (also known as the largest approximately common substructures) of two trees arises in many pattern recognition applications. This paper presents a dynamic programming algorithm to solve the problem based on the distance measure originated from Tanaka and Tanaka. The algorithm runs as fast as the best-known algorithm for comparing two trees using Tanaka's distance measure when the allowed distance between the common substructures is a constant independent of the input trees. In addition, we establish a hierarchy among Tanaka's distance measure and three other edit-based distance measures published in the literature.

Kaizhong Zhang | Jason Tsong-Li Wang | J. Wang | Kaizhong Zhang

[1] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2] Wuu Yang,et al. Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[3] King-Sun Fu,et al. A Tree System Approach for Fingerprint Pattern Recognition , 1976, IEEE Transactions on Computers.

[4] Branimir Boguraev,et al. Dictionaries, Dictionary Grammars and Dictionary Entry Parsing , 1989, ACL.

[5] Kaizhong Zhang,et al. Identifying Approximately Common Substructures in Trees Based on a Restricted Edit Distance , 1999, Inf. Sci..

[6] Stanley M. Selkow,et al. The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[7] Heikki Mannila,et al. Retrieval from hierarchical texts by partial patterns , 1993, SIGIR.

[8] Lusheng Wang,et al. Alignment of trees: an alternative to tree edit , 1995 .

[9] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[10] Marc J. Rochkind,et al. The source code control system , 1975, IEEE Transactions on Software Engineering.

[11] Hanan Samet,et al. Distance Transform for Images Represented by Quadtrees , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] E. N. Adams. Consensus Techniques and the Comparison of Taxonomic Trees , 1972 .

[13] Kaizhong Zhang,et al. Algorithms for the constrained editing distance between ordered labeled trees and related problems , 1995, Pattern Recognit..

[14] Eiichi Tanaka,et al. The Tree-to-Tree Editing Problem , 1988, Int. J. Pattern Recognit. Artif. Intell..

[15] D. Shasha,et al. Discovering active motifs in sets of related protein sequences and using them for classification. , 1994, Nucleic acids research.

[16] D. Robinson,et al. Comparison of phylogenetic trees , 1981 .

[17] Kaizhong Zhang,et al. An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18] Dan Gusfield,et al. Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[19] Kaizhong Zhang,et al. A System for Approximate Tree Matching , 1994, IEEE Trans. Knowl. Data Eng..

[20] J. Ott. Analysis of Human Genetic Linkage , 1985 .

[21] Dana Angluin,et al. Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[22] Kuo-Chung Tai,et al. The Tree-to-Tree Correction Problem , 1979, JACM.

[23] Michael G. Main,et al. An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[24] Esko Ukkonen,et al. Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[25] Kaizhong Zhang,et al. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[26] Kaizhong Zhang,et al. Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..