Empirical Comparison of Clustering Methods for Long Time-Series Databases

In this paper we report some characteristics of time-series comparison methods and clustering methods found empirically using a real-world medical database. First, we examined basic characteristics of two sequence comparison methods, multiscale matching (MSM) and dynamic time warping (DTW), using a simple sine wave and its variants. Next, we examined the characteristics of various combinations of sequence comparison methods and clustering methods, in terms of interpretability of generating clusters, using a time-series medical database. Although the subjects for comparison were limited, the results demonstrated that (1) shape representation parameters in MSM could capture the structural feature of time series; for example, the difference of amplitude was successfully captured using rotation term, and that differences on phases and trends were also successfully reflected in the dissimilarity. (2) However, the dissimilarity induced by MSM lacks linearity compared with DTW. It was also demonstrated that (1) complete-linkage criterion (CL-AHC) outperforms average-linkage (AL-AHC) criterion in terms of the interpret-ability of a dendrogram and clustering results, (2) combination of DTW and CL-AHC constantly produced interpretable results, (3) combination of DTW and RC would be used to find core sequences of the clusters. MSM may suffer from the problem of 'no-match' pairs, however, the problem may be eluded by using RC as a subsequent grouping method.

[1]  Eamonn Keogh Mining Time Series Data , 2005 .

[2]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[3]  Naonori Ueda,et al.  A matching algorithm of deformed planar curves using multiscale convex/concave structures , 1991, Systems and Computers in Japan.

[4]  Brian Everitt,et al.  Cluster analysis , 1974 .

[5]  Shusaku Tsumoto,et al.  Mining similar temporal patterns in long time-series data and its application to medicine , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[7]  Farzin Mokhtarian,et al.  Scale-Based Description and Recognition of Planar Curves and Two-Dimensional Shapes , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[9]  Shusaku Tsumoto,et al.  An Indiscernibility-Based Clustering Method with Iterative Refinement of Equivalence Relations -Rough Clustering- , 2003, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[10]  David G. Lowe,et al.  Organization of smooth image curves at multiple scales , 1988, International Journal of Computer Vision.

[11]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.