DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm

In this paper, we present a new tree mining algorithm, DryadeParent, based on the hooking principle first introduced in DRYADE. In the experiments, we demonstrate that the branching factor and depth of the frequent patterns to find are key factors of complexity for tree mining algorithms, even if often overlooked in previous work. We show that DryadeParent outperforms the current fastest algorithm, CMTreeMiner, by orders of magnitude on data sets where the frequent tree patterns have a high branching factor.

[1]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[2]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Luc De Raedt,et al.  Advances in Mining Graphs, Trees and Sequences , 2005, Fundam. Informaticae.

[5]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[6]  Mario Gerla,et al.  Aggregated Multicast – A Comparative Study , 2002, Cluster Computing.

[7]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[8]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[9]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[10]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[12]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[13]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[14]  Mong-Li Lee,et al.  Mining frequent query patterns from XML queries , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[15]  Hiroki Arimura,et al.  An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees , 2005, ILP.

[16]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[17]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[19]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[20]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[21]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[22]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[23]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[24]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[25]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[26]  Kevin C. Almeroth,et al.  Modeling the branching characteristics and efficiency gains in global multicast trees , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[27]  Charu C. Aggarwal,et al.  XRules: An effective algorithm for structural classification of XML data , 2006, Machine Learning.

[28]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[29]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[30]  R. Bellman Dynamic programming. , 1957, Science.

[31]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[32]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[33]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[34]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[35]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.