Clustering multivariate time series by genetic multiobjective optimization

SummaryMethods for clustering univariate time series often rely on choosing some features relevant for the problem at hand and seeking for clusters according to their measurements, for instance the autoregressive coefficients, spectral measures, time delays at some selected frequencies and special characteristics such as trend, seasonality, etc. In this context some interesting features based on indexes of goodness-of-fit seem worth of special attention. Similar approaches have been suggested for clustering sets of multivariate time series. For example, clusters of regional economies may be formed based on sets of macroeconomic time series for each country. In a multivariate framework, however, the features of interest are more difficult to extract than for univariate time series. Indeed multivariate time series may differ not only for structure or pairwise correlation but for dimensionality and internal correlation as well. We propose some measures of predictability and interpolability as indexes of goodness-of-fit for multivariate time series that may serve as useful features to find clusters in the data. The capability of a clustering methods in distinguishing clusters of multivariate time series may be evaluated by using several cluster internal validity criteria. As each criterion is known to measure some special characteristics of the extracted features, multiobjective clustering methods and a genetic algorithm implementation are used to perform such evaluation. The concept of Pareto optimality in multiobjective genetic algorithms is used to perform simultaneous search over multiple criteria. The advantage in using genetic algorithms for multiobjective optimization resides in the circumstance that genetic algorithms maintain a population of solutions most of them non-dominated in the Pareto sense so that the whole Pareto front may be provided in a single run. The effectiveness of the measures of predictability and interpolability in conjunction with the multiobjective genetic optimization procedure for outlining the cluster structure of a set of multivariate time series will be studied on a set of real time series data. Furthermore, a simulation experiment will be presented to compare the performance of the proposed procedure with procedures arising from alternative approaches.

[1]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[2]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[3]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Julia Handl,et al.  Ant-based and swarm-based clustering , 2007, Swarm Intelligence.

[6]  T. Warren Liao,et al.  A clustering procedure for exploratory mining of vector time series , 2007, Pattern Recognit..

[7]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[8]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[9]  Lawrence W. Lan,et al.  Genetic clustering algorithms , 2001, Eur. J. Oper. Res..

[10]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[11]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[12]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[13]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[14]  Carsten Allefeld,et al.  Detecting synchronization clusters in multivariate time series via coarse-graining of Markov chains. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  D. A. Pierce R 2 Measures for Time Series , 1979 .

[16]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[17]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[18]  G. C. Tiao,et al.  A canonical analysis of multiple time series , 1977 .

[19]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .

[20]  S. Brooks,et al.  Optimization Using Simulated Annealing , 1995 .

[21]  Regunathan Radhakrishnan,et al.  A time series clustering based framework for multimedia mining and summarization using audio features , 2004, MIR '04.

[22]  D. Seborg,et al.  Clustering multivariate time‐series data , 2005 .

[23]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[24]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[25]  J. Kiefer Optimum Experimental Designs , 1959 .

[26]  Amit Konar,et al.  Computational Intelligence: Principles, Techniques and Applications , 2005 .

[27]  Zhiwu Lu,et al.  From comparing clusterings to combining clusterings , 2008, AAAI 2008.

[28]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[29]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[30]  Peter Winker,et al.  Applications of optimization heuristics to estimation and modelling problems , 2004, Comput. Stat. Data Anal..

[31]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[32]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[33]  Lothar Thiele,et al.  An evolutionary algorithm for multiobjective optimization: the strength Pareto approach , 1998 .

[34]  Charles R. Nelson,et al.  The Interpretation of R 2 in Autoregressive-Moving Average Time Series Models , 1976 .

[35]  Roberto Baragona,et al.  A simulation study on clustering time series with metaheuristic methods , 2001 .

[36]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[37]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[38]  G. Reinsel Elements of Multivariate Time Series Analysis , 1995 .