Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams

Abstract —A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Data mining over data streams should support the flexible trade-off between processing time and mining accuracy. In many application areas, mining frequent itemsets has been suggested to find important frequent itemsets by considering the weight of itemsets. In this paper, we present an efficient algorithm WSFI (Weighted Support Frequent Itemsets)-Mine with normalized weight over data streams. Moreover, we propose a novel tree structure, called the Weighted Support FP-Tree (WSFP-Tree), that stores compressed crucial information about frequent itemsets. Empirical results show that our algorithm outperforms comparative algorithms un der the windowed streaming model. Keywords —Frequent Itemsets, Weighted Support, Window Sliding, Weighted Support

[1]  Ming-Syan Chen,et al.  Sliding window filtering: an efficient method for incremental mining on a time-variant database , 2005, Inf. Syst..

[2]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3]  John J. Leggett,et al.  WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight , 2005, SDM.

[4]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[5]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[6]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Suh-Yin Lee,et al.  An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams , 2004 .

[9]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[10]  Philip S. Yu,et al.  WAR: Weighted Association Rules for Item Intensities , 2004, Knowledge and Information Systems.

[11]  Philip S. Yu,et al.  Mining Data Streams , 2005, The Data Mining and Knowledge Discovery Handbook.

[12]  Fionn Murtagh,et al.  Weighted Association Rule Mining using weighted support and significance framework , 2003, KDD '03.

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14]  Byeong-Soo Jeong,et al.  Efficient Mining of Weighted Frequent Patterns over Data Streams , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[15]  Suh-Yin Lee,et al.  Online mining (recently) maximal frequent itemsets over data streams , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[16]  Won Suk Lee,et al.  A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams , 2004, J. Inf. Sci. Eng..

[17]  Vincent S. Tseng,et al.  An efficient algorithm for mining temporal high utility itemsets from data streams , 2008, J. Syst. Softw..

[18]  沈錳坤 An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams , 2004 .

[19]  Philip Yu,et al.  WAR: Weighted association rules for item intensities , 2007, Knowledge and Information Systems.

[20]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[21]  Unil Yun,et al.  Efficient mining of weighted interesting patterns with a strong weight and/or support affinity , 2007, Inf. Sci..