New Algorithms for Unordered Tree Inclusion

The tree inclusion problem is, given two node-labeled trees $P$ and $T$ (the "pattern tree" and the "text tree"), to locate every minimal subtree in $T$ (if any) that can be obtained by applying a sequence of node insertion operations to $P$. The ordered tree inclusion problem is known to be solvable in polynomial time while the unordered tree inclusion problem is NP-hard. The currently fastest algorithm for the latter is from 1995 and runs in $O(poly(m,n) \cdot 2^{2d}) = O^{\ast}(4^{d})$ time, where $m$ and $n$ are the sizes of the pattern and text trees, respectively, and $d$ is the degree of the pattern tree. Here, we develop a new algorithm that improves the exponent $2d$ to $d$ by considering a particular type of ancestor-descendant relationships and applying dynamic programming, thus reducing the time complexity to $O^{\ast}(2^{d})$. We then study restricted variants of the unordered tree inclusion problem where the number of occurrences of different node labels and/or the input trees' heights are bounded and show that although the problem remains NP-hard in many such cases, if the leaves of $P$ are distinctly labeled and each label occurs at most $c$ times in $T$ then it can be solved in polynomial time for $c = 2$ and in $O^{\ast}(1.8^d)$ time for $c = 3$.

[1]  Philip Bille,et al.  The tree inclusion problem: In linear space and faster , 2011, TALG.

[2]  Kaizhong Zhang,et al.  Exact and approximate algorithms for unordered tree matching , 1994, IEEE Trans. Syst. Man Cybern..

[3]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[4]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[5]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[6]  Atsuhiro Takasu,et al.  Exact algorithms for computing the tree edit distance between unordered trees , 2010, Theor. Comput. Sci..

[7]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[8]  Tao Jiang,et al.  Some MAX SNP-Hard Results Concerning Unordered Labeled Trees , 1994, Inf. Process. Lett..

[9]  Jeffrey Xu Yu,et al.  Optimal Enumeration: Efficient Top-k Tree Matching , 2015, Proc. VLDB Endow..

[10]  Robin Thomas,et al.  On the complexity of finding iso- and other morphisms for partial k-trees , 1992, Discret. Math..

[11]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[12]  Eduard H. Hovy,et al.  When Are Tree Structures Necessary for Deep Learning of Representations? , 2015, EMNLP.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Sara Cohen,et al.  A general algorithm for subtree similarity-search , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[16]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[17]  Gabriel Valiente Constrained tree inclusion , 2005, J. Discrete Algorithms.

[18]  Atsuhiro Takasu,et al.  Similar Subtree Search Using Extended Tree Inclusion , 2015, IEEE Trans. Knowl. Data Eng..

[19]  Atsuhiro Takasu,et al.  Efficient exponential-time algorithms for edit distance between unordered trees , 2014, J. Discrete Algorithms.

[20]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[21]  Atsuhiro Takasu,et al.  Author's Personal Copy Theoretical Computer Science Approximation and Parameterized Algorithms for Common Subtrees and Edit Distance between Unordered Trees , 2022 .