Clustering Distributed Time Series in Sensor Networks

Event detection is a critical task in sensor networks, especially for environmental monitoring applications. Traditional solutions to event detection are based on analyzing one-shot data points, which might incur a high false alarm rate because sensor data is inherently unreliable and noisy. To address this issue, we propose a novel Distributed Single-pass Incremental Clustering (DSIC) technique to cluster the time series obtained at sensor nodes based on their underlying trends. In order to achieve scalability and energy-efficiency, our DSIC technique uses a hierarchical structure of sensor networks as the underlying infrastructure. The algorithm first compresses the time series produced at individual sensor nodes into a compact representation using Haar wavelet transform, and then, based on dynamic time warping distances, hierarchically groups the approximate time series into a global clustering model in an incremental manner. Experimental results on both real data and synthetic data demonstrate that our DSIC algorithm is accurate, energy-efficient and robust with respect to network topology changes.

[1]  Douglas H. Fisher,et al.  Supervised classification with temporal data , 1997 .

[2]  Haiyun Luo,et al.  A two-tier data dissemination model for large-scale wireless sensor networks , 2002, MobiCom '02.

[3]  Bin Zhang,et al.  Distributed data clustering can be efficient and exact , 2000, SKDD.

[4]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5]  Ming-Syan Chen,et al.  Clustering on demand for multiple data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .

[7]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[8]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[9]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[10]  Clement T. Yu,et al.  Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Ming-Syan Chen,et al.  Adaptive Clustering for Multiple Evolving Streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[13]  S. Muthukrishnan,et al.  One-Pass Wavelet Decompositions of Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  Daniel J. Abadi,et al.  REED: Robust, Efficient Filtering and Event Detection in Sensor Networks , 2005, VLDB.

[16]  FormanGeorge,et al.  Distributed data clustering can be efficient and exact , 2000 .

[17]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ambuj K. Singh,et al.  Distributed Spatial Clustering in Sensor Networks , 2006, EDBT.

[19]  R. Coifman,et al.  Local feature extraction and its applications using a library of bases , 1994 .

[20]  Mohamed Medhat Gaber,et al.  First International Workshop on Knowledge Discovery from Sensor Data , 2007, Knowledge Discovery and Data Mining.

[21]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[22]  João Gama,et al.  ODAC: Hierarchical Clustering of Time Series Data Streams , 2006, SDM.

[23]  Katia Obraczka,et al.  Efficient continuous mapping in sensor networks using isolines , 2005, The Second Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services.

[24]  Yunhao Liu,et al.  Non-Threshold based Event Detection for 3D Environment Monitoring in Sensor Networks , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[25]  Ossama Younis,et al.  HEED: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks , 2004, IEEE Transactions on Mobile Computing.

[26]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[27]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  João Gama,et al.  Requirements for Clustering Streaming Sensors , 2008 .

[30]  Yunhao Liu,et al.  Contour map matching for event detection in sensor networks , 2006, SIGMOD Conference.

[31]  Wendi Heinzelman,et al.  Energy-efficient communication protocol for wireless microsensor networks , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.