Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams

Online mining changes over data streams has been recognized to be an important task in data mining. Mining changes over data streams is both compelling and challenging. In this paper, we propose a new, single-pass algorithm, called MFC-append (Mining Frequency Changes of append-only data streams), for discovering the frequent frequency-changed items, vibrated frequency changed items, and stable frequency changed items over continuous append-only data streams. A new summary data structure, called Change-Sketch, is developed to compute the frequency changes between two continuous data streams as fast as possible. Moreover, a MFC-append-based algorithm, called MFC-dynamic (Mining Frequency Changes of dynamic data streams), is proposed to find the frequency changes over dynamic data streams. Theoretical analysis and experimental results show that our algorithms meet the major performance requirements, namely single-pass, bounded space requirement, and real-time computing, in mining data streams.

[1]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[2]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[3]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[4]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[5]  Philip S. Yu,et al.  Online Mining of Changes from Data Streams: Research Problems and Preliminary Results , 2003 .

[6]  沈錳坤 An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams , 2004 .

[7]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[8]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[9]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[10]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[11]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[12]  Suh-Yin Lee,et al.  Online mining (recently) maximal frequent itemsets over data streams , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[13]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[14]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[15]  Suh-Yin Lee,et al.  An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams , 2004 .

[16]  Wynne Hsu,et al.  Mining Changes for Real-Life Applications , 2000, DaWaK.

[17]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.