Mining of closed frequent subtrees from frequently updated databases

We study the problem of mining closed frequent subtrees from tree databases that are updated regularly over time. Closed frequent subtrees provide condensed and complete information for all frequent subtrees in the database. Although mining closed frequent subtrees is in general faster than mining all frequent subtrees, this is still a very time consuming process, and thus it is undesirable to mine from scratch when the change to the database is small. The set of previous mined closed subtrees should be reused as much as possible to compute new emerging subtrees. We propose, in this paper, a novel and efficient incremental mining algorithm for closed frequent labeled ordered trees. We adopt a divide-and-conquer strategy and apply different mining techniques in different parts of the mining process. The proposed algorithm requires no additional scan of the whole database while its memory usage is reasonable. Our experimental study on both synthetic and real-life datasets demonstrates the efficiency and scalability of our algorithm.

[1]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[2]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[3]  Ricard Gavaldà,et al.  Mining adaptively frequent closed unlabeled rooted trees in data streams , 2008, KDD.

[4]  Hiroki Arimura,et al.  Online algorithms for mining semi-structured data stream , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[6]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[7]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[10]  José L. Balcázar,et al.  Mining frequent closed rooted trees , 2009, Machine Learning.

[11]  Sen Zhang,et al.  Discovering Frequent Agreement Subtrees from Phylogenetic Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[12]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[14]  Peng Gao,et al.  A New Marketing Channel Management Strategy Based on Frequent Subtree Mining , 2007 .

[15]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[17]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[18]  Minoru Kanehisa,et al.  Mining significant tree patterns in carbohydrate sugar chains , 2008, ECCB.

[19]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[20]  Alexandre Termier,et al.  DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm , 2008, IEEE Transactions on Knowledge and Data Engineering.

[21]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[22]  Hiroki Arimura,et al.  An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees , 2005, ILP.

[23]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[24]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[25]  Lei Zou,et al.  PrefixTreeESpan: A Pattern Growth Algorithm for Mining Embedded Subtrees , 2006, WISE.

[26]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[27]  Arbee L. P. Chen,et al.  Discovering Frequent Tree Patterns over Data Streams , 2006, SDM.

[28]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.