Moment: maintaining closed frequent itemsets over a stream sliding window

This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of item-sets over a sliding-window. The selected itemsets consist of a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent item-sets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than previous approaches.

[1]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[2]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[3]  Edward L. Robertson,et al.  Mining Frequent Itemsets Over Arbitrary Time Intervals in Data Streams , 2003 .

[4]  Ming-Syan Chen,et al.  Sliding-window filtering: an efficient algorithm for incremental mining , 2001, CIKM '01.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Christian Hidber,et al.  Association Rule Mining , 2017 .

[7]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[8]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[9]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[10]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Hiroki Arimura,et al.  Online algorithms for mining semi-structured data stream , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[13]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[14]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[16]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[17]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[18]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[19]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[20]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.