Data Stream Mining Using Granularity-Based Approach

Significant applications require data stream mining algorithms to run in resource-constrained environments. Thus, adaptation is a key process to ensure the consistency and continuity of the running algorithms. This chapter provides a theoretical framework for applying the granularity-based approach in mining data streams. Our Algorithm Output Granularity (AOG) is explained in details providing practitioners the ability to use it for enabling resource-awareness and adaptability for their algorithms. Theoretically, AOG has been formalized using the Probably Approximately Correct (PAC) learning model allowing researchers to formalize the adaptability of their techniques. Finally, the integration of AOG with other adaptation strategies is provided.

[1]  Michael Stonebraker,et al.  Load Shedding on Data Streams , 2003 .

[2]  Philip S. Yu,et al.  A Holistic Approach for Resource-aware Adaptive Data Stream Mining , 2006, New Generation Computing.

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[5]  Geoff Hulten,et al.  A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering , 2001, ICML.

[6]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[7]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[9]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[10]  Philip S. Yu,et al.  Loadstar: A Load Shedding Scheme for Classifying Data Streams , 2005, SDM.

[11]  Mohamed Medhat Gaber,et al.  Resource-aware Very Fast K-Means for ubiquitous data stream mining , 2005 .

[12]  Mohamed Medhat Gaber,et al.  Resource-aware Mining of Data Streams , 2005, J. Univers. Comput. Sci..

[13]  B. Natarajan Machine Learning: A Theoretical Approach , 1992 .

[14]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[15]  Mohamed Medhat Gaber,et al.  On-board Mining of Data Streams in Sensor Networks , 2005 .

[16]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[17]  Nagiza F. Samatova,et al.  Reservoir-Based Random Sampling with Replacement from Data Stream , 2004, SDM.