Density-based Clustering of Time Series Subsequences

Doubts have been raised that time series subsequences can be clustered in a meaningful way. This paper introduces a kernel-density-based algorithm that detects meaningful patterns in the presence of a vast number of random-walk-like subsequences. The value of density-based algorithms for noise elimination in general has long been demonstrated. The challenge of applying such techniques to time-series data consists in first specifying uninteresting sequences that are to be considered as noise, and secondly ensuring that those uninteresting sequences will not affect the clustering result. Both problems are addressed in this paper and the success of the technique is demonstrated on several standard data sets.

[1]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[2]  Daniel A. Keim,et al.  A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[3]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[4]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[8]  F. Reif,et al.  Fundamentals of Statistical and Thermal Physics , 1965 .

[9]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[10]  M. B. Priestley,et al.  Non-linear and non-stationary time series analysis , 1990 .

[11]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[12]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[13]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[14]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..