Clustering of time series using quantile autocovariances

Time series clustering is an active research topic with applications in many fields. Unlike conventional clustering on multivariate data, time series often change over time so that the similarity concept between objects must take into account the dynamic of the series. In this paper, a distance measure aimed to compare quantile autocovariance functions is proposed to perform clustering of time series. Quantile autocovariances provide information about the serial dependence structure at different pairs of quantile levels, require no moment condition and allow to identify dependence features that covariance-based methods are unable to detect. Results from an extensive simulation study show that the proposed metric outperforms or is highly competitive with a range of dissimilarities reported in the literature, particularly exhibiting high capability to cluster time series generated from a broad range of dependence models. Estimation of the optimal number of clusters is also addressed. For illustrative purposes, our methodology is applied to a real dataset involving financial time series.

[1]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .

[2]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[3]  Ta-Hsin Li,et al.  Quantile Periodograms , 2012 .

[4]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[5]  Pierpaolo D'Urso,et al.  Clustering of financial time series , 2013 .

[6]  Hans J. Skaug,et al.  A nonparametric test of serial independence based on the empirical distribution function , 1993 .

[7]  Sylvia Kaufmann,et al.  Model-Based Clustering of Multiple Time Series , 2004 .

[8]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[9]  Yongmiao Hong,et al.  Generalized spectral tests for serial dependence , 2000 .

[10]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[11]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[12]  Pierpaolo D’Urso,et al.  Autocorrelation-based fuzzy clustering of time series , 2009, Fuzzy Sets Syst..

[13]  Stephen L Taylor,et al.  Modelling Financial Time Series , 1987 .

[14]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[15]  R. Koenker Quantile Regression: Name Index , 2005 .

[16]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[17]  Marcella Corduas,et al.  Time series clustering and classification by the autoregressive metric , 2008, Comput. Stat. Data Anal..

[18]  L. Hubert,et al.  Comparing partitions , 1985 .

[19]  José Antonio Vilar,et al.  Non-linear time series clustering based on non-parametric forecast densities , 2010, Comput. Stat. Data Anal..

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  R. Koenker,et al.  Computing regression quantiles , 1987 .

[22]  Edoardo Otranto,et al.  Clustering heteroskedastic time series by model-based procedures , 2008, Comput. Stat. Data Anal..

[23]  Hans-Hermann Bock Special Issue on ‘Time series clustering’ , 2011, Adv. Data Anal. Classif..

[24]  José A. Vilar,et al.  Discriminant and cluster analysis for Gaussian stationary processes: local linear fitting approach , 2004 .

[25]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[26]  Rob J Hyndman,et al.  Sample Quantiles in Statistical Packages , 1996 .

[27]  Holger Dette,et al.  Of Copulas, Quantiles, Ranks and Spectra - An L1-Approach to Spectral Analysis , 2011, 1111.7205.

[28]  José Antonio Vilar,et al.  Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study , 2010, J. Classif..

[29]  Richard A. Davis,et al.  The extremogram: a correlogram for extreme events , 2009, 1001.1821.

[30]  Oliver Linton,et al.  The quantilogram: With an application to evaluating directional predictability , 2007 .

[31]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[32]  Karsten Steinhaeuser,et al.  Motivating Complex Dependence Structures in Data Mining: A Case Study with Anomaly Detection in Climate , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[33]  Andreas Hagemann,et al.  Robust Spectral Analysis , 2011, 1111.1965.

[34]  Elizabeth Ann Maharaj,et al.  A SIGNIFICANCE TEST FOR CLASSIFYING ARMA MODELS , 1996 .

[35]  Suhasini Subba Rao,et al.  The quantile spectral density and comparison based tests for nonlinear time series , 2011 .

[36]  Paul R. Cohen,et al.  Bayesian Clustering by Dynamics Contents 1 Introduction 1 2 Clustering Markov Chains 2 , 2022 .

[37]  Elizabeth Ann Maharaj,et al.  Cluster of Time Series , 2000, J. Classif..

[38]  T. Mikosch,et al.  Limit theory for the sample autocorrelations and extremes of a GARCH (1,1) process , 2000 .

[39]  Richard A. Davis,et al.  The sample autocorrelations of heavy-tailed processes with applications to ARCH , 1998 .

[40]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[41]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[42]  Sylvia Frühwirth-Schnatter,et al.  Panel data analysis: a survey on model-based clustering of time series , 2011, Adv. Data Anal. Classif..

[43]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[44]  Giovanni De Luca,et al.  A tail dependence-based dissimilarity measure for financial time series clustering , 2011, Adv. Data Anal. Classif..

[45]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[46]  Ying Wei,et al.  Computational Issues for Quantile Regression , 2005 .