Classification of changes in evolving data streams using online clustering result deviation

Data stream analysis has attracted considerable attention over the past few years. Streaming data is often evolving over time. Capturing changes could be used for detecting an event or a phenomenon in various applications. Weather conditions, economical changes, astronomical and scientific phenomena are among a wide range of applications. Due to the high volume and speed of data streams, it is computationally hard to capture these changes from raw data in real-time. In this paper, we propose a novel algorithm that we term as STREAM-DETECT to capture these changes in data stream distribution and/or domain using clustering result deviation. STREAMDETECT is followed by a process of offline classification CHANGE-CLASS. This classification is concerned with the association of the history of change characteristics with the observed event or phenomenon. Experimental results show the efficiency of the proposed framework in both detecting the changes and classification accuracy.

[1]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[2]  Charu C. Aggarwal An intuitive framework for understanding changes in evolving data streams , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Hillol Kargupta,et al.  Energy Consumption in Data Analysis for On-board and Distributed Applications , 2003 .

[4]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[5]  Jeffrey Xu Yu,et al.  Mining Changes of Classification by Correspondence Tracing , 2003, SDM.

[6]  Fabio A. González,et al.  TECNO-STREAMS: tracking evolving clusters in noisy data streams with a scalable immune system learning model , 2003, Third IEEE International Conference on Data Mining.

[7]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[8]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[9]  Sanjay Kumar Madria,et al.  Sensor networks: an overview , 2003 .

[10]  Wei Fan StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams , 2004, VLDB.

[11]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[12]  KlinkenbergRalf Learning drifting concepts: Example selection vs. example weighting , 2004 .

[13]  Philip S. Yu,et al.  Decision tree evolution using limited number of labeled data items from drifting data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[15]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[16]  Philip S. Yu,et al.  Online mining of data streams: applications, techniques and progress , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[18]  Mohamed Medhat Gaber,et al.  On-board Mining of Data Streams in Sensor Networks , 2005 .