An Algorithm for Mining Frequent Stream Data Items Using Hash Function and Fading Factor

A new algorithm to mine the frequent items in data stream is presented. The algorithm adopts a time fading factor to emphasize the importance of the relatively newer data, and records the densities of the data items in Hash tables. For a given threshold of density S and an integer k, our algorithm can mine the top k frequent items. Computation time for processing each data item is O(1) . Experimental results show that the algorithm outperforms other methods in terms of accuracy, memory requirement, and processing speed.

[1]  Bill Lin,et al.  Adaptive Frequency Counting over Bursty Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[2]  Hongyan Liu,et al.  Error-Adaptive and Time-Aware Maintenance of Frequency Counts over Data Streams , 2006, WAIM.

[3]  Li Jian-Zhong,et al.  An Efficient Algorithm for Mining Approximate Frequent Item over Data Streams , 2007 .

[4]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[5]  Yong Guan,et al.  Frequency Estimation over Sliding Windows , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Toon Calders,et al.  Mining Frequent Itemsets in a Stream , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  Ling Chen,et al.  Frequent Items Mining on Data Stream Based on Time Fading Factor , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[8]  Themis Palpanas,et al.  Efficiently Discovering Recent Frequent Items in Data Streams , 2008, SSDBM.

[9]  Ling Chen,et al.  An Algorithm for Mining Frequent Items on Data Stream Using Fading Factor , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[10]  Suh-Yin Lee,et al.  Mining frequent itemsets over data streams using efficient window sliding techniques , 2009, Expert Syst. Appl..

[11]  Won Suk Lee,et al.  estMax: Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams , 2009, IEEE Transactions on Knowledge and Data Engineering.