On-board Mining of Data Streams in Sensor Networks

Data streams are generated in large quantities and at rapid rates from sensor networks that typically monitor environmental conditions, traffic conditions and weather conditions among others. A significant challenge in sensor networks is the analysis of the vast amounts of data that are rapidly generated and transmitted through sensing. Given that wired communication is infeasible in the environmental situations outlined earlier, the current method for communicating this data for analysis is through satellite channels. Satellite communication is exorbitantly expensive. In order to address this issue, we propose a strategy for on-board mining of data streams in a resource-constrained environment. We have developed a novel approach that dynamically adapts the data-stream mining process on the basis of available memory resources. This adaptation is algorithm-independent and enables data-stream mining algorithms to cope with high data rates in the light of finite computational resources. We have also developed lightweight data-stream mining algorithms that incorporate our adaptive mining approach for resource constrained environments.

[1]  Mohamed Medhat Gaber,et al.  Resource-aware knowledge discovery in data streams , 2004 .

[2]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[3]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[4]  Frederick Reiss,et al.  TelegraphCQ: An Architectural Status Report , 2003, IEEE Data Eng. Bull..

[5]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[6]  Mohamed Medhat Gaber,et al.  A Wireless Data Stream Mining Model , 2004, Wireless Information Systems.

[7]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[8]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[9]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[11]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[12]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[13]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[14]  Kun Liu,et al.  VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring , 2004, SDM.

[15]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[16]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[17]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[18]  Mohammad Alshayeb,et al.  EVE: On-Board Process Planning and Execution , 2002 .

[19]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[20]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[21]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[22]  Mohamed Medhat Gaber,et al.  Cost-Efficient Mining Techniques for Data Streams , 2004, ACSW.

[23]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[24]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[25]  J Hsu,et al.  Data mining trends and developments: The key data mining technologies and applications for the 21st century , 2002 .

[26]  Michael Stonebraker,et al.  Load Shedding on Data Streams , 2003 .

[27]  Geoff Hulten,et al.  A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering , 2001, ICML.

[28]  Mohamed Medhat Gaber,et al.  A cost-efficient model for ubiquitous data stream mining , 2004 .

[29]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[30]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[31]  Mohamed Medhat Gaber,et al.  Adaptive mining techniques for data streams using algorithm output granularity , 2003 .

[32]  Philip S. Yu,et al.  Online Mining of Changes from Data Streams: Research Problems and Preliminary Results , 2003 .

[33]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[34]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[35]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[36]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[37]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[38]  Carlos Ordonez,et al.  Clustering binary data streams with K-means , 2003, DMKD '03.

[39]  Lei Liu,et al.  MobiMine: monitoring the stock market from a PDA , 2002, SKDD.

[40]  Mohammed J. Zaki Parallel and Distributed Data Mining: An Introduction , 1999, Large-Scale Parallel Data Mining.

[41]  Charless C. Fowlkes,et al.  Diamond Eye: a distributed architecture for image data mining , 1999, Defense, Security, and Sensing.

[42]  Jiawei Han,et al.  MAIDS: mining alarming incidents from data streams , 2004, SIGMOD '04.

[43]  Nick Koudas,et al.  Data stream query processing , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..

[44]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[45]  J. Stroeve,et al.  Onboard Detection of Snow, Ice, Clouds and Other Geophysical Processes Using Kernel Methods , 2003 .

[46]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[47]  Shonali Krishnaswamy,et al.  Cost models for distributed data mining , 2000, ICSE 2000.