Mining Evolutionary Events from Multi-Streams Based on Spectral Clustering

To solve the problem of mining evolutionary events from multi-streams, this paper proposes a spectral clustering algorithm, SCAM (spectral clustering algorithm of multi-streams), to generate the clustering models of Multi-Streams. The similarity matrix in the clustering models of Multi-Streams are based on Coupling Degree, which measures the dynamic similarity between two streams. In addition, this paper also proposes an algorithm, EEMA (evolutionary events mining algorithm), to discover the evolutionary event points based on the drift of clustering models. EEMA takes the index of Clustering Model Quality as the optimization objective in determing the number of clusters automatically. The Clustering Model Quality combines the matrix perturbation theory and the Clustering Cohesion, which has a sound upper bound and is used to measure the compactness of a clustering model. Finally, this paper presents O-EEMA (optimized-EEMA) as the optimization of EEMA with the temporal complexity of O(cn 2 /2), and the results of extensive experiments on the synthetic and real data set show that EEMA

[1]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[2]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[3]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[4]  Tang Chang-jie,et al.  An Anti-Noise Algorithm for Mining Asynchronous Coincidence Pattern in Multi-Streams , 2006 .

[5]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[6]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[7]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[8]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[9]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[10]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[12]  Beresford N. Parlett,et al.  The QR algorithm , 2000, Comput. Sci. Eng..

[13]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[16]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[17]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[18]  Tang Chang-jie,et al.  A Compression Algorithm for Multi-Streams Based on Wavelets and Coincidence , 2007 .

[19]  Lihao Xu,et al.  Multiway cuts and spec-tral clustering , 2003 .

[20]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[21]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[22]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Aoying Zhou,et al.  Distributed Data Stream Clustering: A Fast EM-based Approach , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[26]  llsoo Ahn,et al.  Temporal Databases , 1986, Computer.

[27]  K. Dasgupta,et al.  Matrix perturbation theory for M-theory on a PP-wave , 2002, hep-th/0205185.

[28]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[29]  Zhihai He,et al.  Recognizing Falls from Silhouettes , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.