Partial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification

Pattern-based tree classifiers are capable of producing high quality results, however, they are prone to the problem of the default class overuse. In this paper, we propose a measure designed to address this issue, called partial tree-edit distance (PTED), which allows for assessing the degree of containment of one tree in another. Furthermore, we propose an algorithm which calculates the measure and perform an experiment involving pattern-based classification to illustrate its usefulness. The results show that incorporating PTED into the classification scheme allowed us to significantly improve the accuracy on the tested datasets.

[1]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  Jérôme Darmont,et al.  A Survey of XML Tree Patterns , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[5]  Curtis E. Dyreson,et al.  Approximate Joins for Data-Centric XML , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Denilson Barbosa,et al.  Efficient Top-k Approximate Subtree Matching in Small Memory , 2011, IEEE Transactions on Knowledge and Data Engineering.

[7]  Kaizhong Zhang,et al.  Approximate Tree Matching in the Presence of Variable Length Don't Cares , 1994, J. Algorithms.

[8]  Bernardo Magnini,et al.  Combining Lexical Resources with Tree Edit Distance for Recognizing Textual Entailment , 2005, MLCW.

[9]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[10]  Sara Cohen,et al.  A general algorithm for subtree similarity-search , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[11]  Charu C. Aggarwal,et al.  XRules: An effective algorithm for structural classification of XML data , 2006, Machine Learning.

[12]  Sara Cohen Indexing for subtree similarity-search using edit distance , 2013, SIGMOD '13.

[13]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[14]  Laurent Tichit,et al.  RNA secondary structure comparison: exact analysis of the Zhang-Shasha tree edit algorithm , 2003, Theor. Comput. Sci..