Mining frequent closed itemsets from a landmark window over online data streams

The frequent closed itemsets determine exactly the complete set of frequent itemsets and are usually much smaller than the later. However, mining frequent closed itemsets from a landmark window over data streams is a challenging problem. To solve the problem, this paper presents a novel algorithm (called FP-CDS) that can capture all frequent closed itemsets and a new storage structure (called FP-CDS tree) that can be dynamically adjusted to reflect the evolution of itemsets' frequencies over time. A landmark window is divided into several basic windows and these basic windows are used as updating units. Potential frequent closed itemsets in each basic window are mined and stored in FP-CDS tree based on some proposed strategies. Extensive experiments are conducted to validate the proposed method.

[1]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[2]  Arbee L. P. Chen,et al.  Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window , 2005, SDM.

[3]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[4]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[6]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[7]  Xindong Wu,et al.  Mining maximal frequent itemsets from data streams , 2007, J. Inf. Sci..

[8]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[9]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[10]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[11]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[12]  Suh-Yin Lee,et al.  A New Algorithm for Maintaining Closed Frequent Itemsets in Data Streams by Incremental Updates , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[13]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[15]  Xin Zhang A High-Speed Heuristic Algorithm for Mining Frequent Patterns in Data Stream , 2005 .

[16]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[17]  Hiroki Arimura,et al.  Online algorithms for mining semi-structured data stream , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Zhuang Yue-ting,et al.  Mining Frequent Closed Patterns by Adaptive Pruning , 2004 .

[19]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[20]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.