A Modified K-means Algorithm for Sequence Clustering

In this paper, we extend our research to construct a system which provides clustering services, more than user-active search. We use DCT mapping to extract features from sequences, and discuss sequence similarities of whole similarity and partial similarity. The two kinds of similarity concepts will be applied when clustering sequences of equal-length and variable-length, respectively.In the case of equal-length, we map a sequence to a dimensional point in the feature space, and then cluster these sequences accordingly by applying hierarchical clustering and partitional clustering (i.e., K-means). In the case of variable-length, we cut a sequence into subsequences by sliding window, and map subsequences to f-dimensional points. We propose a Modified K-means (MK) algorithm to handle partial similarity of subsequences. Finally, we implement our methods and perform experiments to show the efficiency and effectiveness of our approach.

[1]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[2]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Jason R. Chen Making subsequence time series clustering meaningful , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[6]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[7]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[8]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[9]  K.A. Peker Subsequence time series (STS) clustering techniques for meaningful pattern discovery , 2005, International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2005..

[10]  George Nagy,et al.  In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure , 2006, CIKM '06.