A clustering procedure for exploratory mining of vector time series

A two-step procedure is developed for the exploratory mining of real-valued vector (multivariate) time series using partition-based clustering methods. The proposed procedure was tested with model-generated data, multiple sensor-based process data, as well as simulation data. The test results indicate that the proposed procedure is quite effective in producing better clustering results than a hidden Markov model (HMM)-based clustering method if there is a priori knowledge about the number of clusters in the data. Two existing validity indices were tested and found ineffective in determining the actual number of clusters. Determining the appropriate number of clusters in the case that there is no a priori knowledge is a known unresolved research issue not only for our proposed procedure but also for the HMM-based clustering method and further development is necessary.

[1]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[2]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[3]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[4]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  Xiaohui Liu,et al.  Variable grouping in multivariate time series via correlation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Mário A. T. Figueiredo,et al.  Similarity-Based Clustering of Sequences Using Hidden Markov Models , 2003, MLDM.

[9]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .

[10]  Lane M. D. Owsley,et al.  Automatic clustering of vector time-series for manufacturing machine monitoring , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  T. Liao,et al.  An adaptive genetic clustering method for exploratory mining of feature vector and time series data , 2006 .

[12]  Xiaohui Liu,et al.  Evolutionary Computation to Search for Strongly Correlated Variables in High-Dimensional Time-Series , 1999, IDA.

[13]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Paul R. Cohen,et al.  Multivariate Clustering by Dynamics , 2000, AAAI/IAAI.

[15]  K. Kosmelj,et al.  Cross-sectional approach for clustering time varying data , 1990 .

[16]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[17]  Paul R. Cohen,et al.  Bayesian Clustering by Dynamics Contents 1 Introduction 1 2 Clustering Markov Chains 2 , 2022 .

[18]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..