SAZED: parameter-free domain-agnostic season length estimation in time series data

Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may, in turn, lead practitioners to dedicate considerable effort to preprocessing of time series data since existing approaches either require dedicated parameter-tuning or their performance is heavily domain-dependent. Hence, to address these challenges, we propose SAZED: spectral and average autocorrelation zero distance density. SAZED is a versatile ensemble of multiple, specialized time series season length estimation approaches. The combination of various base methods selected with respect to domain-agnostic criteria and a novel seasonality isolation technique, allow a broad applicability to real-world time series of varied properties. Further, SAZED is theoretically grounded and parameter-free, with a computational complexity of O(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(n\log n)$$\end{document}, which makes it applicable in practice. In our experiments, SAZED was statistically significantly better than every other method on at least one dataset. The datasets we used for the evaluation consist of time series data from various real-world domains, sterile synthetic test cases and synthetic data that were designed to be seasonal and yet have no finite statistical moments of any order.

[1]  Chris Chatfield,et al.  Introduction to Statistical Time Series. , 1976 .

[2]  Per Jönsson,et al.  Seasonality extraction by function fitting to time-series of satellite sensor data , 2002, IEEE Trans. Geosci. Remote. Sens..

[3]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[4]  Robert Boorstyn,et al.  Single tone parameter estimation from discrete-time observations , 1974, IEEE Trans. Inf. Theory.

[5]  Mohammed Al-Shalalfa,et al.  Efficient Periodicity Mining in Time Series Databases Using Suffix Trees , 2011, IEEE Transactions on Knowledge and Data Engineering.

[6]  Rik Sarkar,et al.  Finding Periodic Discrete Events in Noisy Streams , 2017, CIKM.

[7]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[8]  Roman Kern,et al.  Robust Parameter-Free Season Length Detection in Time Series , 2019, ArXiv.

[9]  Philip S. Yu,et al.  On Periodicity Detection and Structural Periodic Similarity , 2005, SDM.

[10]  Luís Torgo,et al.  Arbitrated Ensemble for Time Series Forecasting , 2017, ECML/PKDD.

[11]  Walid G. Aref,et al.  WARP: time warping for periodicity detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Biao Huang,et al.  Cyclo-period estimation for discrete-time cyclo-stationary signals , 2006, IEEE Transactions on Signal Processing.

[13]  Achim Zeileis,et al.  Applied Econometrics with R , 2008 .

[14]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[15]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[16]  Eric R. Ziegel,et al.  Data: A Collection of Problems From Many Fields for the Student and Research Worker , 1987 .

[17]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[18]  Louis L. Scharf,et al.  A regularized maximum likelihood estimator for the period of a cyclostationary process , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[19]  Walid G. Aref,et al.  Periodicity detection in time series databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Jürgen Gross,et al.  Linear Regression , 2003 .

[21]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[22]  Jiawei Han,et al.  Detecting Multiple Periods and Periodic Patterns in Event Time Sequences , 2017, CIKM.

[23]  James D. Hamilton Time Series Analysis , 1994 .

[24]  Jure Leskovec,et al.  Modeling Individual Cyclic Variation in Human Behavior , 2017, WWW.

[25]  Panagiotis Papapetrou,et al.  Size Matters: Finding the Most Informative Set of Window Lengths , 2012, ECML/PKDD.

[26]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[27]  Mahesh Kumar,et al.  Clustering seasonality patterns in the presence of errors , 2002, KDD.

[28]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.