论文信息 - ODAC: Hierarchical Clustering of Time Series Data Streams

ODAC: Hierarchical Clustering of Time Series Data Streams

This paper presents a time series whole clustering system that incrementally constructs a tree-like hierarchy of clusters, using a top-down strategy. The Online Divisive-Agglomerative Clustering (ODAC) system uses a correlation-based dissimilarity measure between time series over a data stream and possesses an agglomerative phase to enhance a dynamic behavior capable of concept drift detection. Main features include splitting and agglomerative criteria based on the diameters of existing clusters and supported by a significance level. At each new example, only the leaves are updated, reducing computation of unneeded dissimilarities and speeding up the process every time the structure grows. Experimental results on artificial and real data suggest competitive performance on clustering time series and show that the system is equivalent to a batch divisive clustering on stationary time series, being also capable of dealing with concept drift. With this work, we assure the possibility and importance of hierarchical incremental time series whole clustering in the data stream paradigm, presenting a valuable and usable option.

[1] Eamonn J. Keogh,et al. Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[2] Geoff Hulten,et al. Mining high-speed data streams , 2000, KDD '00.

[3] Min Wang,et al. Efficient Evaluation of Composite Correlations for Streaming Time Series , 2003, WAIM.

[4] Michalis Vazirgiannis,et al. On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[5] Gerhard Widmer,et al. Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[6] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[7] Daniel Barbará,et al. Requirements for clustering data streams , 2002, SKDD.