A novel two-level clustering method for time series data analysis

Clustering analysis has been applied in a wild variety of fields such as biology, medicine, economics, etc. For time series clustering, dimension reduction methods like data sampling or piecewise aggregate approximation (PAA) algorithm are often applied to reduce data dimension before clustering. Consequently, the information of subsequence may be overlooked. Nevertheless, some properties of time series with the same sampling data may result in different clustering results after considering the subsequence information. In this paper, we propose a novel two-level clustering method named 2LTSC (two-level time series clustering), which considers both the whole time series, denoted as level-1 in the first level, and the subsequence information of time series, denoted as level-2 in the second level. The data length of level-2 could be different and thus is also considered in the second level in the proposed 2LTSC method. Through experimental evaluation, it is shown that the proposed two-level clustering method, which considers two different time granules at the same time, can provide different and deeper viewpoints for time series clustering analysis.

[1]  Vincent S. Tseng,et al.  A Novel Similarity-Based Fuzzy Clustering Algorithm by Integrating PCM and Mountain Method , 2007, IEEE Transactions on Fuzzy Systems.

[2]  George K. Kokkinakis,et al.  Algorithm for clustering continuous density HMM by recognition error , 1996, IEEE Trans. Speech Audio Process..

[3]  Pasi Fränti,et al.  Time-series clustering by approximate prototypes , 2008, ICPR.

[4]  V.S. Tseng,et al.  Efficiently mining gene expression data via a novel parameterless clustering method , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[7]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[9]  Sergio M. Focardi,et al.  Clustering economic and financial time series : Exploring the existence of stable correlation conditions , 2001 .

[10]  Nikos Mamoulis,et al.  Fast and Exact Warping of Time Series Using Adaptive Segmental Approximations , 2005, Machine Learning.

[11]  Vincent S. Tseng,et al.  A pre-processing method to deal with missing values by integrating clustering and regression techniques , 2003, Appl. Artif. Intell..

[12]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[13]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  Chonghui Guo,et al.  Time Series Clustering Based on ICA for Stock Data Analysis , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[16]  James C. Bezdek,et al.  Visual cluster validity for prototype generator clustering models , 2003, Pattern Recognit. Lett..

[17]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[18]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2003, Third IEEE International Conference on Data Mining.

[19]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[20]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[21]  James Allan,et al.  Text alignment with handwritten documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[22]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[23]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..