Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Time series clustering is to assign a set of time series into groups that share certain similarity. It has become an attractive analytic tool as many applications require such classifications. Clustering may also result in more accurate parameter estimates when a group of time series are assumed to share common models and parameters, especially for short panel time series. Many existing time series clustering methods are based on the assumption that the time series are linear. However, linearity assumptions often fail to hold. In this paper we consider the problem of clustering nonlinear time series. We propose the use of a two dimensional Kolmogorov-Smirnov statistic as a distance measure of two time series by measuring the affinity of nonlinear serial dependence structures. It is nonparametric in nature hence no model assumption are needed. The approach is illustrated with simulation studies as well as real data examples.

[1]  K. Kosmelj,et al.  Cross-sectional approach for clustering time varying data , 1990 .

[2]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[3]  E. Maasoumi,et al.  A Dependence Metric for Possibly Nonlinear Processes , 2004 .

[4]  Elizabeth Ann Maharaj,et al.  A SIGNIFICANCE TEST FOR CLASSIFYING ARMA MODELS , 1996 .

[5]  Borja Lafuente-Rego,et al.  Clustering of time series using quantile autocovariances , 2016, Adv. Data Anal. Classif..

[6]  Peter R Hobson,et al.  Computationally efficient algorithms for the two-dimensional Kolmogorov–Smirnov test , 2008 .

[7]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[8]  D. Peña,et al.  Multivariate Analysis in Vector Time Series , 2000 .

[9]  John W. Galbraith,et al.  Testing for a Unit Root , 1993 .

[10]  Pierpaolo D’Urso,et al.  Autocorrelation-based fuzzy clustering of time series , 2009, Fuzzy Sets Syst..

[11]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[12]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[13]  Elizabeth Ann Maharaj,et al.  A hypothesis test using bias-adjusted AR estimators for classifying time series in small samples , 2013, Comput. Stat. Data Anal..

[14]  Yuanhui Xiao,et al.  A fast algorithm for two-dimensional Kolmogorov-Smirnov two sample tests , 2017, Comput. Stat. Data Anal..

[15]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[16]  Ting Zhang,et al.  Clustering High-Dimensional Time Series Based on Parallelism , 2013 .

[17]  Wenxuan Zhong,et al.  Penalized Clustering of Large-Scale Functional Data With Multiple Covariates , 2008, 0801.2555.

[18]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[19]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[20]  P. Phillips Testing for a Unit Root in Time Series Regression , 1988 .

[21]  José Antonio Vilar,et al.  Non-linear time series clustering based on non-parametric forecast densities , 2010, Comput. Stat. Data Anal..

[22]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[23]  H. Tong Non-linear time series. A dynamical system approach , 1990 .

[24]  Raul H. C. Lopes,et al.  A two-dimensional Kolmogorov-Smirnov test , 2009 .

[25]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[26]  Jane L. Harvill,et al.  Bispectral-based methods for clustering time series , 2013, Comput. Stat. Data Anal..

[27]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[28]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[29]  G. Fasano,et al.  A multidimensional version of the Kolmogorov–Smirnov test , 1987 .

[30]  Howell Tong,et al.  On tests for self-exciting threshold autoregressive-type non-linearity in partially observed time series , 1991 .

[31]  Cees Diks,et al.  Nonparametric Tests for Independence , 2009, Encyclopedia of Complexity and Systems Science.

[32]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[33]  Jean-Marie Dufour,et al.  Nonparametric testing for time series: A bibliography , 1982 .

[34]  José Antonio Vilar,et al.  Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study , 2010, J. Classif..

[35]  Sylvia Kaufmann,et al.  Model-Based Clustering of Multiple Time Series , 2004 .

[36]  Dit-Yan Yeung,et al.  Time series clustering with ARMA mixtures , 2004, Pattern Recognit..

[37]  Marcella Corduas,et al.  Time series clustering and classification by the autoregressive metric , 2008, Comput. Stat. Data Anal..

[38]  Elizabeth Ann Maharaj,et al.  Cluster of Time Series , 2000, J. Classif..

[39]  J. Peacock Two-dimensional goodness-of-fit testing in astronomy , 1983 .