Quantile autocovariances: A powerful tool for hard and soft partitional clustering of time series

Abstract A key issue in cluster analysis is determining a proper dissimilarity measure between two data objects, and many pairwise dissimilarities have been proposed to deal with time series. Assuming that the clustering purpose is to group series according to the underlying dependence structures, a detailed study of the behavior in clustering of a dissimilarity based on comparing estimated quantile autocovariance functions (QAF) is carried out. Quantile autocovariances provide information about the serial dependence structure that other conventional features are not able to capture, which suggests great potential to perform clustering of series. The asymptotic behavior of the sample quantile autocovariances is studied and an algorithm to determine optimal combinations of lags and pairs of quantile levels to perform clustering is introduced. The proposed metric is used to perform hard and soft partitioning-based clustering. First, a broad simulation study examines the behavior of the proposed metric in crisp clustering with the PAM procedure. A novel fuzzy C-medoids algorithm based on the QAF-dissimilarity is then proposed and compared with other fuzzy procedures in a new simulation study conducted to cluster fuzzy scenarios involving AR and GARCH models. In all cases, the QAF-based procedures outperform or are highly competitive with a range of dissimilarities reported in the literature, particularly exhibiting high capability to cluster conditionally heteroskedastic time series and robustness to the distributional form of the errors. Two specific applications involving air quality data and financial time series illustrate the usefulness of the proposed procedures.

[1]  Jian Yu,et al.  Alpha-Cut Implemented Fuzzy Clustering Algorithms and Switching Regressions , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Pierpaolo D'Urso,et al.  Clustering of financial time series , 2013 .

[3]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Oliver Linton,et al.  The Cross-Quantilogram: Measuring Quantile Dependence and Testing Directional Predictability between Time Series , 2014, 1402.1937.

[5]  Elizabeth Ann Maharaj,et al.  Time-Series Clustering , 2015 .

[6]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[7]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[8]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[9]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[10]  Karsten Steinhaeuser,et al.  Motivating Complex Dependence Structures in Data Mining: A Case Study with Anomaly Detection in Climate , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[11]  Tapan Kamdar,et al.  On Creating Adaptive Web Servers Using Weblog Mining , 2000 .

[12]  Yongmiao Hong,et al.  Generalized spectral tests for serial dependence , 2000 .

[13]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[14]  Elizabeth Ann Maharaj,et al.  Fuzzy clustering of time series in the frequency domain , 2011, Inf. Sci..

[15]  Pierpaolo D'Urso,et al.  Robust clustering of imprecise data , 2014 .

[16]  Massimiliano Caporin,et al.  Variance Clustering Improved Dynamic Conditional Correlation MGARCH Estimators , 2011, Comput. Stat. Data Anal..

[17]  Andreas Hagemann,et al.  Robust Spectral Analysis , 2011, 1111.1965.

[18]  Massimiliano Caporin,et al.  Fast Clustering of GARCH Processes Via Gaussian Mixture Models , 2012, Math. Comput. Simul..

[19]  Pierpaolo D’Urso,et al.  Autocorrelation-based fuzzy clustering of time series , 2009, Fuzzy Sets Syst..

[20]  Francisco de A. T. de Carvalho,et al.  Partitional fuzzy clustering methods based on adaptive quadratic distances , 2006, Fuzzy Sets Syst..

[21]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[22]  José A. Vilar,et al.  Discriminant and cluster analysis for Gaussian stationary processes: local linear fitting approach , 2004 .

[23]  Oliver Linton,et al.  The quantilogram: With an application to evaluating directional predictability , 2007 .

[24]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[25]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[26]  P. D’Urso,et al.  Noise fuzzy clustering of time series by autoregressive metric , 2013 .

[27]  R. Koenker Quantile Regression: Name Index , 2005 .

[28]  Suhasini Subba Rao,et al.  The quantile spectral density and comparison based tests for nonlinear time series , 2011 .

[29]  Howell Tong,et al.  On tests for self-exciting threshold autoregressive-type non-linearity in partially observed time series , 1991 .

[30]  Elizabeth Ann Maharaj,et al.  Wavelet-based Fuzzy Clustering of Time Series , 2010, J. Classif..

[31]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[32]  Ta-Hsin Li,et al.  Quantile Periodograms , 2012 .

[33]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[34]  Jun Qu,et al.  Grinding wheel condition monitoring with boosted minimum distance classifiers , 2008 .

[35]  James C. Bezdek,et al.  Efficient Implementation of the Fuzzy c-Means Clustering Algorithms , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Rotich Titus Kipkoech Modeling Volatility under Normal and Student-t Distributional Assumptions (A Case Study of the Kenyan Exchange Rates) , 2014 .

[37]  Chih-Kung Lee,et al.  Novel Approach to Fuzzy-Wavelet ECG Signal Analysis for a Mobile Device , 2010, Journal of Medical Systems.

[38]  Elizabeth Ann Maharaj,et al.  Fuzzy clustering of time series using extremes , 2017, Fuzzy Sets Syst..

[39]  Elizabeth Ann Maharaj,et al.  A SIGNIFICANCE TEST FOR CLASSIFYING ARMA MODELS , 1996 .

[40]  Borja Lafuente-Rego,et al.  Clustering of time series using quantile autocovariances , 2016, Adv. Data Anal. Classif..

[41]  Holger Dette,et al.  Of Copulas, Quantiles, Ranks and Spectra - An L1-Approach to Spectral Analysis , 2011, 1111.7205.

[42]  José Antonio Vilar,et al.  Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study , 2010, J. Classif..

[43]  V. J. Rayward-Smith,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .

[44]  Pierpaolo D'Urso,et al.  Autoregressive metric-based trimmed fuzzy clustering with an application to PM10 time series , 2017 .

[45]  Edoardo Otranto,et al.  Clustering heteroskedastic time series by model-based procedures , 2008, Comput. Stat. Data Anal..

[46]  Geeta Sikka,et al.  Recent Techniques of Clustering of Time Series Data: A Survey , 2012 .

[47]  James C. Bezdek,et al.  A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain , 1992, IEEE Trans. Neural Networks.

[48]  Hans J. Skaug,et al.  A nonparametric test of serial independence based on the empirical distribution function , 1993 .

[49]  D. Peña,et al.  Multivariate Analysis in Vector Time Series , 2000 .

[50]  Luc Bauwens,et al.  Bayesian Clustering of Many Garch Models , 2003 .

[51]  Edoardo Otranto,et al.  Identifying financial time series with similar dynamic conditional correlation , 2010, Comput. Stat. Data Anal..

[52]  Carol Alexander,et al.  Normal Mixture Garch(1,1): Applications to Exchange Rate Modelling , 2004 .

[53]  Paul D. McNicholas,et al.  Variable Selection for Clustering and Classification , 2013, J. Classif..

[54]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[55]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[56]  L. Hubert,et al.  Comparing partitions , 1985 .

[57]  Elizabeth Ann Maharaj,et al.  Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks , 2013, Soft Comput..

[58]  C. Hennig,et al.  How to find an appropriate clustering for mixed‐type variables with application to socio‐economic stratification , 2013 .

[59]  Christian Döring,et al.  Data analysis with fuzzy clustering methods , 2006, Comput. Stat. Data Anal..

[60]  T. Mikosch,et al.  Limit theory for the sample autocorrelations and extremes of a GARCH (1,1) process , 2000 .

[61]  José Antonio Vilar,et al.  Classifying Time Series Data: A Nonparametric Approach , 2009, J. Classif..

[62]  Pierpaolo D'Urso,et al.  GARCH-based robust clustering of time series , 2016, Fuzzy Sets Syst..

[63]  Jorge Caiado,et al.  A GARCH-based method for clustering of financial time series: International stock markets evidence , 2007 .

[64]  Frank Klawonn,et al.  Fuzzy clustering: More than just fuzzification , 2015, Fuzzy Sets Syst..

[65]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[66]  Pierpaolo D'Urso,et al.  Time series clustering by a robust autoregressive metric with application to air pollution , 2015 .

[67]  Pierpaolo D'Urso,et al.  A weighted fuzzy c , 2006, Comput. Stat. Data Anal..

[68]  Richard A. Davis,et al.  The extremogram: a correlogram for extreme events , 2009, 1001.1821.

[69]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[70]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[71]  R. Koenker Quantile Regression: Fundamentals of Quantile Regression , 2005 .

[72]  José A. Vilar,et al.  Functional ANOVA starting from discrete data: an application to air quality data , 2013, Environmental and Ecological Statistics.

[73]  Elizabeth Ann Maharaj,et al.  Cluster of Time Series , 2000, J. Classif..

[74]  Elizabeth Ann Maharaj,et al.  Wavelets-based clustering of multivariate time series , 2012, Fuzzy Sets Syst..