A fragmented-periodogram approach for clustering big data time series

We propose and study a new frequency-domain procedure for characterizing and comparing large sets of long time series. Instead of using all the information available from data, which would be computationally very expensive, we propose some regularization rules in order to select and summarize the most relevant information for clustering purposes. Essentially, we suggest to use a fragmented periodogram computed around the driving cyclical components of interest and to compare the various estimates. This procedure is computationally simple, but able to condense relevant information of the time series. A simulation exercise shows that the smoothed fragmented periodogram works in general better than the non-smoothed one and not worse than the complete periodogram for medium to large sample sizes. We illustrate this procedure in a study of the evolution of several stock markets indices. We further show the effect of recent financial crises over these indices behaviour.

[1]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[2]  Chung-Kang Peng,et al.  Clustering Heart Rate Dynamics Is Associated with β-Adrenergic Receptor Polymorphisms: Analysis by Information-Based Similarity Index , 2011, PloS one.

[3]  Richard A. Davis,et al.  Time Series: Theory and Methods (2nd ed.). , 1992 .

[4]  Fulvio Corsi,et al.  A Simple Approximate Long-Memory Model of Realized Volatility , 2008 .

[5]  Elizabeth Ann Maharaj,et al.  Time-Series Clustering , 2015 .

[6]  Elizabeth Ann Maharaj,et al.  A SIGNIFICANCE TEST FOR CLASSIFYING ARMA MODELS , 1996 .

[7]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[8]  P. J. Diggle,et al.  TESTS FOR COMPARING TWO ESTIMATED SPECTRAL DENSITIES , 1986 .

[9]  Peter J. Diggle,et al.  Nonparametric Comparison of Cumulative Periodograms , 1991 .

[10]  Jorge Caiado,et al.  Identifying common dynamic features in stock returns , 2010 .

[11]  Clifford Lam,et al.  Estimation of latent factors for high-dimensional time series , 2011 .

[12]  H. Tong,et al.  Cluster of time series models: an example , 1990 .

[13]  Jorge Caiado,et al.  Comparison of Times Series with Unequal Length in the Frequency Domain , 2009, Commun. Stat. Simul. Comput..

[14]  Marco Lippi,et al.  The Generalized Dynamic Factor Model , 2002 .

[15]  Tim Bollerslev,et al.  Risk Everywhere: Modeling and Managing Volatility , 2017 .

[16]  George E. P. Box,et al.  Identifying a Simplifying Structure in Time Series , 1987 .

[17]  D. Peña,et al.  Multivariate Analysis in Vector Time Series , 2000 .

[18]  Catherine Doz,et al.  A Two-Step Estimator for Large Approximate Dynamic Factor Models Based on Kalman Filtering , 2007 .

[19]  Jorge Caiado,et al.  Clustering financial time series with variance ratio statistics , 2014 .

[20]  Pilar Poncela,et al.  More is not always better : back to the Kalman lter in Dynamic Factor Models , 2012 .

[21]  Edoardo Otranto,et al.  Identifying financial time series with similar dynamic conditional correlation , 2010, Comput. Stat. Data Anal..

[22]  M. Hallin,et al.  The Generalized Dynamic-Factor Model: Identification and Estimation , 2000, Review of Economics and Statistics.

[23]  James H. Stock,et al.  Dynamic Factor Models , 2011 .

[24]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[25]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[26]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[27]  J. Bai,et al.  Large Dimensional Factor Analysis , 2008 .

[28]  Sir W Thomson THE TIDE GAUGE, TIDAL HARMONIC ANALYSER, AND TIDE PREDICTER. , 1881 .

[29]  J. Bai,et al.  A Panic Attack on Unit Roots and Cointegration , 2001 .

[30]  Serena Ng,et al.  Are More Data Always Better for Factor Analysis? , 2003 .

[31]  Catherine Doz,et al.  A Quasi–Maximum Likelihood Approach for Large, Approximate Dynamic Factor Models , 2006, Review of Economics and Statistics.