An Efficient Algorithm for Finding Frequent Items in a Stream

Most of the existing algorithms for mining frequent items over data streams do not emphasis the importance of the more recent data items. We present an efficient algorithm where a fading factor lambda is used for computing frequency counts exceeding a user-specified threshold over data streams. Our algorithm lambda-Miner can detect epsilon-approximate frequent items of a data stream using O(epsilon−1) memory space and the processing time for each data item is O(1). Experimental results on several artificial data sets and real data sets show that lambda-Miner performs better than lambda-LC in terms with precision, memory requirement and time cost.

[1]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[2]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[3]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[4]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[5]  Hongyan Liu,et al.  Error-Adaptive and Time-Aware Maintenance of Frequency Counts over Data Streams , 2006, WAIM.

[6]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[7]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[8]  Erik D. Demaine,et al.  Finding frequent items in sliding windows with multinomially-distributed item frequencies , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[9]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[10]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[11]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[12]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[13]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[14]  Yong Guan,et al.  Frequency Estimation over Sliding Windows , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..