Mining frequent closed trees in evolving data streams

We propose new algorithms for adaptively mining closed rooted trees, both labeled and unlabeled, from data streams that change over time. Closed patterns are powerful representatives of frequent patterns, since they eliminate redundant information. Our approach is based on an advantageous representation of trees and a low-complexity notion of relaxed closed trees, as well as ideas from Galois Lattice Theory. More precisely, we present three closed tree mining algorithms in sequence: an incremental one, IncTreeMiner, a sliding-window based one, WinTreeMiner, and finally one that mines closed trees adaptively from data streams, AdaTreeMiner. By adaptive we mean here that it presents at all times the closed trees that are frequent in the current state of the data stream. To the best of our knowledge this is the first work on mining closed frequent trees in streaming data varying with time. We give a first experimental evaluation of the proposed algorithms.

[1]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[2]  Hiroki Arimura,et al.  Online algorithms for mining semi-structured data stream , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Jianyong Wang,et al.  Efficient Mining of Frequent Closed XML Query Pattern , 2007, Journal of Computer Science and Technology.

[4]  Ricard Gavaldà,et al.  Adaptive XML Tree Classification on Evolving Data Streams , 2009, ECML/PKDD.

[5]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[7]  Baihua Zheng,et al.  CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data , 2007, DASFAA.

[8]  Wilfred Ng,et al.  Maintaining frequent closed itemsets over a sliding window , 2008, Journal of Intelligent Information Systems.

[9]  Gemma C. Garriga,et al.  Coproduct Transformations on Lattices of Closed Partial Orders , 2004, ICGT.

[10]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[11]  Suh-Yin Lee,et al.  Online mining of frequent query trees over XML data streams , 2006, WWW '06.

[12]  Kiyoko F. Aoki-Kinoshita,et al.  A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology , 2008, TKDD.

[13]  Ricard Gavaldà,et al.  Mining adaptively frequent closed unlabeled rooted trees in data streams , 2008, KDD.

[14]  Hiroki Arimura,et al.  An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees , 2005, ILP.

[15]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[16]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[17]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[18]  Gabriel Valiente,et al.  Algorithms on Trees and Graphs , 2002, Springer Berlin Heidelberg.

[19]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[20]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[21]  Albert Bifet,et al.  Adaptive XML Tree Mining on Evolving Data Streams , 2009 .

[22]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[23]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[24]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[25]  Tyng-Luh Liu,et al.  Approximate tree matching and shape similarity , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[27]  Feng Gao,et al.  Towards Generic Pattern Mining , 2005, ICFCA.

[28]  Donald E. Knuth The art of computer programming: fundamental algorithms , 1969 .

[29]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[30]  Arbee L. P. Chen,et al.  Discovering Frequent Tree Patterns over Data Streams , 2006, SDM.

[31]  Daniel A. Keim,et al.  Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , 2002, KDD.

[32]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[33]  D. Knuth,et al.  Generating all trees : history of combinatorial generation , 2006 .

[34]  José L. Balcázar,et al.  Mining frequent closed rooted trees , 2009, Machine Learning.