Clustering of time series data - a survey

Time series clustering has been shown effective in providing useful information in various domains. There seems to be an increased interest in time series clustering as part of the effort in temporal data mining research. To provide an overview, this paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains. The basics of time series clustering are presented, including general-purpose clustering algorithms commonly used in time series clustering studies, the criteria for evaluating the performance of the clustering results, and the measures to determine the similarity/dissimilarity between two time series being compared, either in the forms of raw data, extracted features, or some model parameters. The past researchs are organized into three groups depending upon whether they work directly with the raw data either in the time or frequency domain, indirectly with features extracted from the raw data, or indirectly with models built from the raw data. The uniqueness and limitation of previous research are discussed and several possible topics for future research are identified. Moreover, the areas that time series clustering have been applied to are also summarized, including the sources of data used. It is hoped that this review will serve as the steppingstone for those interested in advancing this area of research.

[1]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[5]  Gautam Biswas,et al.  Temporal Pattern Generation Using Hidden Markov Model Based Unsupervised Classification , 1999, IDA.

[6]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Dit-Yan Yeung,et al.  Mixtures of ARMA models for model-based time series clustering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Elijah Kannatey-Asibu,et al.  Hidden Markov model-based tool wear monitoring in turning , 2002 .

[11]  Roberto Baragona,et al.  A simulation study on clustering time series with metaheuristic methods , 2001 .

[12]  Mahesh Kumar,et al.  Clustering seasonality patterns in the presence of errors , 2002, KDD.

[13]  Axel Wismüller,et al.  Cluster Analysis of Biomedical Image Time-Series , 2002, International Journal of Computer Vision.

[14]  Laura Firoiu,et al.  Clustering Time Series with Hidden Markov Models and Dynamic Time Warping , 1999 .

[15]  Paul R. Cohen,et al.  Bayesian Clustering by Dynamics Contents 1 Introduction 1 2 Clustering Markov Chains 2 , 2022 .

[16]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[17]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[19]  M. Narasimha Murty,et al.  Efficient clustering of large data sets , 2001, Pattern Recognition.

[20]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[21]  Dat Tran,et al.  Fuzzy C-Means Clustering-Based Speaker Verification , 2002, AFSS.

[22]  Lawrence R. Rabiner,et al.  A modified K-means clustering algorithm for use in isolated work recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[23]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[24]  L. K. Hansen,et al.  Feature‐space clustering for fMRI meta‐analysis , 2001, Human brain mapping.

[25]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[26]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[27]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[28]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[29]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[30]  Paul R. Cohen,et al.  Multivariate Clustering by Dynamics , 2000, AAAI/IAAI.

[31]  Amir B. Geva,et al.  Nonstationary time series analysis by temporal clustering , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[32]  Jarke J. van Wijk,et al.  Cluster and Calendar Based Visualization of Time Series Data , 1999, INFOVIS.

[33]  K. Kosmelj,et al.  Cross-sectional approach for clustering time varying data , 1990 .

[34]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[35]  H. Akaike A new look at the statistical model identification , 1974 .

[36]  R. Dahlhaus On the Kullback-Leibler information divergence of locally stationary processes , 1996 .

[37]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[38]  Lane M. D. Owsley,et al.  Self-organizing feature maps and hidden Markov models for machine-tool monitoring , 1997, IEEE Trans. Signal Process..

[39]  P. Boesiger,et al.  A new correlation‐based fuzzy logic clustering algorithm for FMRI , 1998, Magnetic resonance in medicine.

[40]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[41]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[42]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[44]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[45]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[46]  Henry Wu A Genetic Hard C-Means Clustering Algorithm , 2002 .

[47]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[49]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[50]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[51]  Eamonn J. Keogh,et al.  Segmenting Time Series: A Survey and Novel Approach , 2002 .

[52]  G. P. King,et al.  Using cluster analysis to classify time series , 1992 .

[53]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[54]  J. Beran,et al.  Visualizing the Relationship between Two Time Series by Hierarchical Smoothing Models , 1999 .

[55]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[56]  Dipartimentodi Sociologia A simulation study on clustering time series with metaheuristic methods , .

[57]  Jongwoo Kim,et al.  A note on the Gustafson-Kessel and adaptive fuzzy clustering algorithms , 1999, IEEE Trans. Fuzzy Syst..

[58]  T. Warren Liao,et al.  Simultaneous grouping of parts and machines with an integrated fuzzy clustering method , 2002, Fuzzy Sets Syst..

[59]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[60]  Mike Dale,et al.  Building Models of Ecological Dynamics Using HMM Based Temporal Data Clustering - A Preliminary Study , 2001, IDA.

[61]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[62]  Geok See Ng,et al.  Democracy in pattern classifications: combinations of votes from various pattern classifiers , 1998, Artif. Intell. Eng..

[63]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[64]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[65]  Elizabeth Ann Maharaj,et al.  Cluster of Time Series , 2000, J. Classif..

[66]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[67]  Abraham Kandel,et al.  Data Mining in Time Series Database , 2004 .

[68]  R. Shumway Time-frequency clustering and discriminant analysis , 2003 .

[69]  Vladimir Estivill-Castro Spatial Clustering for Data Mining with Genetic Algorithms , 1997 .