POTMiner: mining ordered, unordered, and partially-ordered trees

Non-linear data structures are becoming more and more common in data mining problems. Trees, in particular, are amenable to efficient mining techniques. In this paper, we introduce a scalable and parallelizable algorithm to mine partially-ordered trees. Our algorithm, POTMiner, is able to identify both induced and embedded subtrees in such trees. As special cases, it can also handle both completely ordered and completely unordered trees.

[1]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[2]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[3]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[4]  Li Shen,et al.  New Algorithms for Efficient Mining of Association Rules , 1999, Inf. Sci..

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[6]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[7]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[8]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[10]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.

[11]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[12]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[13]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[14]  Stefan Kramer,et al.  Frequent free tree discovery in graph data , 2004, SAC '04.

[15]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[16]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[17]  Ran Wolff,et al.  A high-performance distributed algorithm for mining association rules , 2004, Knowledge and Information Systems.

[18]  Björn Bringmann,et al.  To See the Wood for the Trees: Mining Frequent Tree Patterns , 2004, Constraint-Based Mining and Inductive Databases.

[19]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[20]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[21]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[22]  Yun Chi,et al.  Canonical forms for labelled trees and their applications in frequent subtree mining , 2005, Knowledge and Information Systems.

[23]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[24]  Tharam S. Dillon,et al.  X3-Miner: Mining Patterns from XML Database , 2005 .

[25]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[26]  Philip S. Yu,et al.  Cross-relational clustering with user's guidance , 2005, KDD '05.

[27]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[28]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[29]  F. Hadzic,et al.  MB3-Miner: efficiently mining eMBedded subTREEs using Tree Model Guided candidate generation , 2005 .

[30]  Hiroyuki Kawano,et al.  AMIOT: induced ordered tree mining in tree-structured databases , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[31]  Tharam S. Dillon,et al.  MB3-Miner: mining eMBedded subTREEs using Tree Model Guided candidate generation , 2005 .

[32]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[33]  Richi Nayak,et al.  Knowledge Discovery from XML Documents , 2006, Lecture Notes in Computer Science.

[34]  Richi Nayak,et al.  Knowledge discovery from XML documents : First International Workshop, KDXD 2006, Singapore, April 9, 2006 : proceedings , 2006 .

[35]  Shirish Tatikonda,et al.  TRIPS and TIDES: new algorithms for tree mining , 2006, CIKM '06.

[36]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[37]  Tharam S. Dillon,et al.  UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[38]  Harald C. Gall,et al.  4th International Workshop on Mining Software Repositories (MSR 2007) , 2007, 29th International Conference on Software Engineering (ICSE'07 Companion).

[39]  Fernando Berzal Galiano,et al.  Hierarchical Program Representation for Program Element Matching , 2007, IDEAL.

[40]  Sen Zhang,et al.  Discovering Frequent Agreement Subtrees from Phylogenetic Data , 2008, IEEE Transactions on Knowledge and Data Engineering.