Mining Induced and Embedded Subtrees in Ordered, Unordered, and Partially-Ordered Trees

Many data mining problems can be represented with non-linear data structures like trees. In this paper, we introduce a scalable algorithm to mine partially-ordered trees. Our algorithm, POTMiner, is able to identify both induced and embedded subtrees and, as special cases, it can handle both completely ordered and completely unordered trees (i.e. the particular situations existing algorithms address).

[1]  Richi Nayak,et al.  Knowledge Discovery from XML Documents , 2006, Lecture Notes in Computer Science.

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[3]  Harald C. Gall,et al.  4th International Workshop on Mining Software Repositories (MSR 2007) , 2007, 29th International Conference on Software Engineering (ICSE'07 Companion).

[4]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[5]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[6]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[7]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[8]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[11]  Fernando Berzal Galiano,et al.  Hierarchical Program Representation for Program Element Matching , 2007, IDEAL.

[12]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[13]  Hiroyuki Kawano,et al.  AMIOT: induced ordered tree mining in tree-structured databases , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[16]  Richi Nayak,et al.  Knowledge discovery from XML documents : First International Workshop, KDXD 2006, Singapore, April 9, 2006 : proceedings , 2006 .

[17]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[19]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[20]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Philip S. Yu,et al.  Cross-relational clustering with user's guidance , 2005, KDD '05.

[22]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[23]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[24]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.