Clustering of time series using hybrid symbolic aggregate approximation

Clustering of time series is one of the best-known grand challenges in time series analysis because of its application potentialities and difficulty. It is like data clustering and the task of partitioning time series into several groups based on their similarities, such that time series in a cluster are similar and they are not similar to other clusters. In the last decade, symbolic aggregate approximation (SAX), which is a high-level symbolic representation for time series, has attracted the attention of many data mining researchers. SAX enables time series analysis to be applied to sequence mining techniques. In this study, we propose a new approach for clustering time series that utilizes a moving average convergence divergence (MACD)-histogram-based SAX (MHSAX) and the k-medoids method. MHSAX is a hybrid symbolic aggregate approximation combining the SAX strings of a time series and its MACD histogram. By utilizing MHSAX, we can calculate the more accurate distance between time series compared with other approaches. This improves the affinity with the k-medoids method and improves the accuracy of clustering. We actually implemented the proposed clustering method and conducted experiments using the whole UCR Time Series Archive data sets. The experimental results show that the proposed method is superior to other state-of-the-art methods.

[1]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[2]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2015, SIGMOD Conference.

[3]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[4]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[5]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[7]  Andrea Cerioli,et al.  Functional Cluster Analysis of Financial Time Series , 2005 .

[8]  Martin Meckesheimer,et al.  Automatic outlier detection for time series: an application to sensor data , 2007, Knowledge and Information Systems.

[9]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[10]  Abraham Kandel,et al.  Data Mining in Time Series Database , 2004 .

[11]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[12]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[13]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[14]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[15]  Jason Chen Useful Clustering Outcomes from Meaningful Time Series Clustering , 2007, AusDM.

[16]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[17]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[18]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[19]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[20]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[21]  Mahmut Ozer,et al.  EEG signals classification using the K-means clustering and a multilayer perceptron neural network model , 2011, Expert Syst. Appl..

[22]  Eric Wang,et al.  LittleTable: A Time-Series Database and Its Uses , 2017, SIGMOD Conference.

[23]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[24]  J. Murphy Technical Analysis of the Financial Markets , 1999 .

[25]  Takumi Ichimura,et al.  Time series classification using MACD-Histogram-based SAX and its performance evaluation , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[26]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[27]  Eamonn J. Keogh,et al.  Discovery of Meaningful Rules in Time Series , 2015, KDD.

[28]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[29]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[30]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[31]  Jessica Lin,et al.  Visually mining and monitoring massive time series , 2004, KDD.