Model guided algorithm for mining unordered embedded subtrees

Large amount of online information is or can be represented using semi-structured documents, such as XML. The information contained in an XML document can be effectively represented using a rooted ordered labeled tree. This has made the frequent pattern mining problem recast as the frequent subtree mining problem, which is a pre-requisite for association rule mining form tree-structured documents. Driven by different application needs a number of algorithms have been developed for mining of different subtree types under different support definitions. In this paper we present an algorithm for mining unordered embedded subtrees. It is an extension of our general tree model guided (TMG) candidate generation framework and the proposed U3 algorithm considers all support definitions, namely, transaction-based, occurrence-match and hybrid support. A number of experiments are presented on synthetic and real world data sets. The results demonstrate the flexibility of our general TMG framework as well as its efficiency when compared to the existing state-of-the-art approach.

[1]  Tharam S. Dillon,et al.  State of the art of data mining of tree structured information , 2008, Comput. Syst. Sci. Eng..

[2]  Lei Zou,et al.  Mining Frequent Induced Subtrees by Prefix-Tree-Projected Pattern Growth , 2006, 2006 Seventh International Conference on Web-Age Information Management Workshops.

[3]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[4]  Hiroyuki Kawano,et al.  AMIOT: induced ordered tree mining in tree-structured databases , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Tharam S. Dillon,et al.  Knowledge Analysis with Tree Patterns , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[7]  Tharam S. Dillon,et al.  U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[8]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[9]  Tharam S. Dillon,et al.  Mining Substructures in Protein Data , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[10]  Tharam S. Dillon,et al.  MB3-Miner: mining eMBedded subTREEs using Tree Model Guided candidate generation , 2005 .

[11]  Caro Lucas,et al.  Mining Maximal Embedded Unordered Tree Patterns , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[12]  Tharam S. Dillon,et al.  UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[13]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[14]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.

[15]  Fedja Hadzic,et al.  Implications of frequent subtree mining using hybrid support definitions , 2007 .

[16]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[19]  Shirish Tatikonda,et al.  TRIPS and TIDES: new algorithms for tree mining , 2006, CIKM '06.

[20]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[22]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[23]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[25]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[26]  Yun Chi,et al.  Canonical forms for labelled trees and their applications in frequent subtree mining , 2005, Knowledge and Information Systems.

[27]  Mohammed J. Zaki,et al.  LOGML: Log Markup Language for Web Usage Mining , 2001, WEBKDD.

[28]  Gabriel Valiente,et al.  Algorithms on Trees and Graphs , 2002, Springer Berlin Heidelberg.