EclatDS: An efficient sliding window based frequent pattern mining method for data streams

Mining frequent patterns over data streams is an interesting problem due to its wide application area. The researchers in this field have been facing two key challenges, namely reduction in runtime and memory usage. In this study, a novel method for efficient mining of frequent patterns over data streams is proposed. The method is based on sliding window model which divides the window into a number of panes. This method provides a new sliding window mechanism by utilizing a set of simple short lists. Each list stores related information about an item in the sliding window. The proposed mechanism dynamically adopts itself with the concept change. This method is empirically evaluated against recently proposed pane based sliding window algorithms. Experimental results on synthetically generated and real life data streams show the superiority of the proposed method with multiple orders of magnitude in terms of runtime and memory usage with respect to other pane based sliding window algorithms.

[1]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[3]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[4]  Won Suk Lee,et al.  estMax: Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[6]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[7]  Li Shen,et al.  New Algorithms for Efficient Mining of Association Rules , 1999, Inf. Sci..

[8]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[9]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[10]  Arbee L. P. Chen,et al.  Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window , 2005, SDM.

[11]  Suh-Yin Lee,et al.  Mining frequent itemsets over data streams using efficient window sliding techniques , 2009, Expert Syst. Appl..

[12]  Jia-Ling Koh,et al.  Concept Shift Detection for Frequent Itemsets from Sliding Windows over Data Streams , 2009, DASFAA Workshops.

[13]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[14]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[15]  Won Suk Lee,et al.  estWin: Online data stream mining of recent frequent itemsets by sliding window method , 2005, J. Inf. Sci..

[16]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[17]  Hong Chen,et al.  Mining non-derivable frequent itemsets over data stream , 2009, Data Knowl. Eng..

[18]  Themis Palpanas,et al.  Frequent items in streaming data: An experimental evaluation of the state-of-the-art , 2009, Data Knowl. Eng..

[19]  Suh-Yin Lee,et al.  An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams , 2004 .

[20]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[21]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Salvatore Orlando,et al.  Approximate mining of frequent patterns on streams , 2007, Intell. Data Anal..

[23]  Hong Chen,et al.  An Efficient Algorithm for Frequent Itemset Mining on Data Streams , 2006, Industrial Conference on Data Mining.

[24]  Carlo Zaniolo,et al.  Verifying and Mining Frequent Patterns from Large Windows over Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[26]  Hongjun Lu,et al.  A false negative approach to mining frequent itemsets from high speed transactional data streams , 2006, Inf. Sci..

[27]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[28]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[29]  Xiaomin Zhu,et al.  A frequent pattern based framework for event detection in sensor network stream data , 2009, SensorKDD '09.

[30]  Pauray S. M. Tsai,et al.  Mining frequent itemsets in data streams using the weighted sliding window model , 2009, Expert Syst. Appl..

[31]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[32]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.